rke up error : FATA[0000] Cluster must have at least one etcd plane host: failed to connect to the following etcd host(s) - rancher-rke

I am trying to install rke for rancher, then ran: rke config, that creates the cluster.yml
and cluster.rkestate, then ran: rke up and got the error:
FATA[0000] Cluster must have at least one etcd plane host: failed to connect to the following etcd host(s) []
how I can fix it ?

Related

Helm: how set Kubernetes cluster Endpoint

I have two containers:
one hosting the cluster (minikube)
one where the deployment is triggered (with helm)
When running elm install I get
Error: INSTALLATION FAILED: Kubernetes cluster unreachable: Get "http://localhost:8080/version": dial tcp 127.0.0.1:8080: connect: connection refused
This is clear, because my cluster is running on a different host). How/Where can I set the Kubernetes Cluster IP address? When I run helm install my app should be deployed on the remote cluster.
It can be done with
helm --kube-context=
The steps to create the context are desribed here

"Calico CNI - calico-kube-controllers Failed to initialize Calico data store" error

I am running Calico CNI v3.19.1 on Kubernetes version v1.19.13. using CentOS Steam 8 and RHEL 8 OS for this cluster with 3 master and 3 nodes.
when calico-kube-controller starts on the worker node. Its failing with [FATAL][1] main.go 118: Failed to initialize Calico data store error message.
I used below setting while deploying calico
# Auto-detect the BGP IP address.
- name: IP
value: "autodetect"
- name: IP_AUTODETECTION_METHOD
value: "interface=en.*"
- name: FELIX_IPTABLESBACKEND
value: "NFT"
Error message during pod startup:
klf calico-kube-controllers-5978c5f6b5-bxbmw
2021-07-26 15:24:21.353 [INFO][1] main.go 92: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
W0726 15:24:21.356093 1 client_config.go:615] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2021-07-26 15:24:21.357 [INFO][1] main.go 113: Ensuring Calico datastore is initialized
2021-07-26 15:24:31.357 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
2021-07-26 15:24:31.357 [FATAL][1] main.go 118: Failed to initialize Calico datastore error=Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
Any tips to resolve this error message?
Thanks
SR
Try this on you master node, this is work for me.
$ sudo iptables -P INPUT ACCEPT
$ sudo iptables -P FORWARD ACCEPT
$ sudo iptables -P FORWARD ACCEPT
$ sudo iptables -F

rancher's rke fails to start on new cluster

/opt/kubernetes/bin/rke up --config /home/msh/rancher-cluster.yml
the rancher-cluser.yml file contains:
nodes:
- address: 192.168.10.34
internal_address: 172.17.0.2
user: bsh
role: [controlplane,etcd]
- address: 192.168.10.35
internal_address: 172.17.0.3
user: bsh
role: [worker]
- address: 192.168.10.36
internal_address: 172.17.0.4
user: bsh
role: [worker]
add_job_timeout: 120
Note: I have not configured any interface internal_address on any of the nodes. My understanding is that rancher/k8s will set these up for me . . . or something.
Here's the tail end of rke failing to start.
INFO[0039] Removing container [rke-bundle-cert] on host [192.168.10.34], try #1
INFO[0039] Image [rancher/rke-tools:v0.1.69] exists on host [192.168.10.34]
INFO[0039] Starting container [rke-log-linker] on host [192.168.10.34], try #1
INFO[0040] [etcd] Successfully started [rke-log-linker] container on host [192.168.10.34]
INFO[0040] Removing container [rke-log-linker] on host [192.168.10.34], try #1
INFO[0040] [remove/rke-log-linker] Successfully removed container on host [192.168.10.34]
INFO[0040] [etcd] Successfully started etcd plane.. Checking etcd cluster health
WARN[0055] [etcd] host [192.168.10.34] failed to check etcd health: failed to get /health for host [192.168.10.34]: Get https://172.17.0.2:2379/health: Unable to access the service on 172.17.0.2:2379. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused)
FATA[0055] [etcd] Failed to bring up Etcd Plane: etcd cluster is unhealthy: hosts [192.168.10.34] failed to report healthy. Check etcd container logs on each host for more information
Using:
Rancher v2.5.2
rke version v1.0.16
docker-ce-19.03.14-3.el8.x86_64
From my understanding the interface configuration has to preexist. RKE will not take care of interface configurations.
Therefore either setup an internal subnet and assign your interfaces to it or use the external address also for the internal communication.

While creating kubernetes cluster with kubeadm with loadbalancer dns and port in aws getting Error

I'm trying to create high availability cluster for that I did following procedure
created Ec2 instance and added that to loadbalancer
By using that loadbalancer port and ip I gave following command.
sudo kubeadm init --control-plane-endpoint "LOAD_BALANCER_DNS:LOAD_BALANCER_PORT" --upload-certs
but master node is not created
getting this error
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in docker:
- 'docker ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
when I give systemctl status kubelet
I'm getting ip not found error
can someone please help

minikube and how to debug api server error

I dont get what is going on with minikube. Below are the steps I undertook to fix the problem with a stopped apiserver.
1) I dont know why the api server stopped. How do I debug? this folder is empty:
--> EMPTY ~/.minikube/logs/
2) After stop I start again and minikube says all is well. I do a status check and I get apiserver: Error. So...no logs..how do I debug?
3) And finally what would cause and apiserver error?
Thanks
~$ minikube status
host: Running
kubelet: Running
apiserver: Stopped
kubectl: Correctly Configured: pointing to minikube-vm at 192.168.99.100~$
~$ minikube stop
Stopping local Kubernetes cluster...
Machine stopped.
~$ minikube start
Starting local Kubernetes v1.12.4 cluster...
Starting VM...
Getting VM IP address...
Moving files into cluster...
Setting up certs...
Connecting to cluster...
Setting up kubeconfig...
Stopping extra container runtimes...
Machine exists, restarting cluster components...
Verifying kubelet health ...
Verifying apiserver health .....Kubectl is now configured to use the cluster.
Loading cached images from config file.
Everything looks great. Please enjoy minikube!
~$ minikube status
host: Running
kubelet: Running
apiserver: Error