Kubernetes: unable to join a remote master node - kubernetes

Hello I am facing a kubeadm join problem on a remote server.
I want to create a multi-server, multi-node Kubernetes Cluster.
I created a vagrantfile to create a master node and N workers.
It works on a single server.
The master VM is a bridge Vm, to make it accessible to the other available Vms on the network.
I choose Calico as a network provider.
For the Master node this's what I've done:
Using ansible :
Initialize Kubeadm.
Installing a network provider.
Create the join command.
For Worker node:
I execute the join command to join the running master.
I created successfully the cluster on one single hardware server.
I am trying to create regular worker nodes on another server on the same LAN, I ping to the master successfully.
To join the Master node using the generated command.
kubeadm join 192.168.2.50:6443 --token ecqb8f.jffj0hzau45b4ro2
--ignore-preflight-errors all
--discovery-token-ca-cert-hash
sha256:94a0144fe419cfb0cb70b868cd43pbd7a7bf45432b3e586713b995b111bf134b
But it showed this error:
error execution phase preflight: couldn't validate the identity of the API Server:
could not find a JWS signature in the cluster-info ConfigMap for token ID "ecqb8f"
I am asking if there is any specific network configuration to join the remote master node.

It seems token is expired or removed. You can create token manually by running:
kubeadm token create --print-join-command
Use the output as join command.

If you see the output as:
"
error execution phase preflight: couldn't validate the identity of the API Server: could not find a JWS signature in the cluster-info ConfigMap for token ID "s1isfw"
To see the stack trace of this error execute with --v=5 or higher
" on a node while joining k8s cluster.
Reason:
This issue arises when the token is expired. TTL for token is 23 hours by default, since the time they've been generated, either when kubeadm init is done or generated separately.
In such a case, you can first check if the token you're using for joining the worker to master can be retrieved by command on master :
kubeadm token list
Steps:
Case 1). if you see NO OUTPUT of the above command, then the best deal is to generate token again from master:
on master execute: kubeadm token create --print-join-command
copy everything and structure if necessary and execute this as a command on worker node.
Check the nodes from master. This worker should now have joined the cluster.
Case 2). if you see an output with
TOKEN, TTL, EXPIRES, USAGES, DESCRIPTION, EXTRA GROUPS.
Check the host entries and pinging among the nodes (master and workers).
(firewall could also cause this.)
use this token again on the workers.
OR go with case 1.
Just wanted to add 1 more thing :
DO NOT USE --ignore-preflight-errors all
as nodes(master to work) commands would show errors later. In my env, I do not use this.

Related

Terraform dial tcp 192.xx.xx.xx:443: i/o timeout error

I am trying to implement CI / CD using GitLab + Terraform to K8S Cluster and K8S Control Plane (Master node) was setup on CentOS
However, Pipeline job fails with the following error
Error: Failed to get existing workspaces: Get "https://192.xx.xx.xx/api/v1/namespaces/default/secrets?labelSelector=tfstate%3Dtrue": dial tcp 192.xx.xx.xx:443: i/o timeout
From the error mentioned above (default/secrets?labelSelector=tfstate%3Dtrue), I assume the error is related to missing 'terraform secret' on default namespace
Example (Terraform secret taken from my Windows)
PS C:\> kubectl get secret
NAME TYPE DATA AGE
default-token-7mzv6 kubernetes.io/service-account-token 3 27d
tfstate-default-state Opaque 1 15h
However, I am not sure which process would create 'tfsecret' or should we create it manually ?
Kindly let me know if I my understanding is wrong and had I missed anything else
EDIT
The issue mentioned above occurred because existing Gitlab-runner was on a different subnet (eg 172.xx.xx.xx instead of 192.xx.xx.xx)
I was asked to use a different Gitlab-runner which runs on the same subnet and now it throws the following error
Error: Failed to get existing workspaces: Get "https://192.xx.xx.xx:6443/api/v1/namespaces/default/secrets?labelSelector=tfstate%3Dtrue": x509: certificate signed by unknown authority
Now, I am bit confused whether the certificate-issue is between GitLab-Runner and Gitlab-Server or Gitlab-Server and K8S Cluster or something else
You have configured Kubernetes as the remote state backend for your Terraform configuration. The error is, that the backend is trying to query existing secrets to determine what workspaces are configured. The x509: certificate signed by unknown authority indicates, that the KUBECONFIG the remote state backend uses does not match the CA of the API server you're connecting to.
If the runners are K8s pods themselves, make sure you provide a KUBECONFIG that matches your target cluster and that the remote state does not configure itself as in-cluster by reading the service account token every K8s pod has - which in most cases will only work for the cluster the pod is running on.
You don't provide enough information to be more specific. But big picture, you have to configure the state backend, and any provider that connect to K8s. Theoretically, the state backend secrets and the K8s resources do not have to be on the same cluster. Meaning, you may have to have different configuration for state backend and K8s providers.

Where is the kubectl command get executed?

I want to know where do we usually execute kubectl command ?
Is it on master node or a different node, because i executed kubectl command from one of the EC2 instances in AWS and master and worker node was completely different ( total 3 node = 1 master and 2 worker node).
And when we create cluster ,does that cluster lies in worker node or it includes master node too?
The kubectl command itself is a command line utility that always executes locally however all it really does is issue commands against a Kubernetes server via its Kubernetes API
Which Kubernetes server it acts against is determined by the local environment the command is run with. This is configured using a "kubeconfig" file which is is read from the KUBECONFIG environment variable (or defaults to the file in in $HOME/.kube/config). For more information see Configuring Access to Multiple Clusters

Azure Service Fabric Cluster returns nothing for code-versions and config-versions

In short: both the "sfctl cluster code-versions" and "sfctl cluster config-versions" return empty arrays. Is this a symptom of a problem with the cluster?
Background: I am attempting to follow the Create a Linux container app tutorial, for learning about Service Fabric; but I have run into a problem when the application upload fails with a timeout.
On investigating this, I found that the other sfctl cluster commands (e.g. sfctl cluster health) all worked and returned useful data - except code-versions and config-versions, which both return an empty array:
$ sfctl cluster code-versions
[]
$ sfctl cluster config-versions
[]
I'm not sure if that's unhealthy, or what kind of data they might be returning.
Other notes:
The cluster is secured with a self-signed certificate; this is installed locally and works correctly, but both the above commands also log a warning:
~/.local/lib/python3.5/site-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings InsecureRequestWarning)
However, the same warning is logged for the other commands (e.g. sysctl cluster health) and doesn't stop them from working.
The cluster is at version 6.4.634.1, on Linux
Service Fabric Explorer shows everything as Healthy: Cluster Health State, System Application Health State, and the 3 nodes.
The Azure portal shows the cluster status as "Baseline upgrade"
Explorer shows the cluster as having Code Version "0.0.0.0"

K8s - add kubeadm to existing cluster

I have the cluster created many time ago without kubeadm (maybe it was kubespray, but the configuration for that also lost).
Is any way exists to add nodes to that cluster or attach kubeadm to current configuration or extend without erasing by kubespray?
If Kubeadm was used to generate the original cluster then you can log into the Master and run kubeadm token generate. This will generate an API Token for you. With this API token your worker nodes will be able to preform an authenticated CSR against your Master to perform a joining request. You can follow this guide from there to add a new node with the command kubeadm join.

kubeadm join failing. Unable to request signed cert

I'm a bit confused by this, because it was working for days without issue.
I use to be able to join nodes to my cluster withoout issue. I would run the below on the master node:
kubeadm init .....
After that, it would generate a join command and token to issue to the other nodes I want to join. Something like this:
kubeadm join --token 99385f.7b6e7e515416a041 192.168.122.100
I would run this on the nodes, and they would join without issue. The next morning, all of a sudden this stopped working. This is what I see when I run the command now:
[kubeadm] WARNING: kubeadm is in alpha, please do not use it for
production clusters.
[preflight] Running pre-flight checks
[tokens] Validating provided token
[discovery] Created cluster info discovery client, requesting info from "http://192.168.122.100:9898/cluster-info/v1/?token-id=99385f"
[discovery] Cluster info object received, verifying signature using given token
[discovery] Cluster info signature and contents are valid, will use API endpoints [https://192.168.122.100:6443]
[bootstrap] Trying to connect to endpoint https://192.168.122.100:6443
[bootstrap] Detected server version: v1.6.0-rc.1
[bootstrap] Successfully established connection with endpoint "https://192.168.122.100:6443"
[csr] Created API client to obtain unique certificate for this node, generating keys and certificate signing request
failed to request signed certificate from the API server [cannot create certificate signing request: the server could not find the requested resource]
It seems like the node I'm trying to join does successfully connect to the API server on the master node, but for some reason, it now fails to request a certificate.
Any thoughts?
To me
sudo service kubelet restart
didn't work.
What I did was the following:
Copied from master node contents of /etc/kubernetes/* into slave nodes at same location /etc/kubernetes
I tried again "kubeadm join ..." command. This time the nodes joined the cluster without any complaint.
I think this is a temporary hack, but worked!
ok, I just stop and started kubelet on the master node as shown below, and things started working again:
sudo service kubelet stop
sudo service kubelet start
EDIT:
This only seemed to work on time for me.