Whenever I set up a Rancher Kubernetes cluster with RKE, the cluster sets up perfectly. However, I'm getting the following warning message:
WARN[0011] [reconcile] host [host.example.com] is a control plane node without reachable Kubernetes API endpoint in the cluster
WARN[0011] [reconcile] no control plane node with reachable Kubernetes API endpoint in the cluster found
(in the above message, the host.example.com is a placeholder for my actual host name, this message is given for each controlplane host specified in the cluster.yml)
How can I modify the RKE cluster.yml file or any other setting to avoid this warning?
I don't believe you can suppress this warning since as you indicate in your comments, the warning is valid on the first rke up command. It is only a warning, and a valid one at that, even though your configuration appears to have a handle on that. If you are worried about the logs, you could perhaps have your log aggregation tool ignore the warning if it is in close proximity to the initial rke up command, or even filter it out. However, I would think twice about filtering blindly on it as it would indicate a potential issue (if, for example, you thought the control plane containers were already running).
Related
I have installed Kubeflow using manifest. After installing ml-pipeline, the pod is in "CrashLoopBackOff" state. I changed the destinationrule for ml-pipeline, ml-pipeline-ui and ml-pipeline-msql to DISABLE but no luck. Can anyone help with this?
Thanks in advance.
There are a bunch of possible root causes for this POD’s status, but I am going to try to focus on the most common ones. To choose the correct one for your accurate situation, you are going to need to take a look into the “describe” and the log from the POD with "CrashLoopBackOff" state.
Verify if the “describe” says something like “Back-off restarting failed container” and the log says something like “a container name must be specified for …”, “F ml_metadata/metadata_store/metadata_store_server_main.cc:219] Non-OK-status …”.
If yes, the problem is the dynamic volume provisioning regularly, maybe because no volume provisioner is installed.
On the other hand, you can verify your cluster’s size, because anything less than 8 CPUs is going to run only if you reduce each service’s requested cpu in the manifest files.
You do not give details on the affected POD yet; but another option is to try to install Katib only (without Kubeflow or other resources) on your K8s cluster to verify other Kubernetes resources do not affect this connection. You can use the following URL’s information for more empirical cases’ troubleshooting and solutions: Multiple Pods stuck in CrashLoopBackOff, katib-mysql , ml-pipeline-persistenceagent pod keeps crashing.
Finally just confirm that you followed the correct instructions, based on the Distribution you used to deploy Kubeflow, you can visit the following URL: Kubeflow Distributions
I have the following scenario in the lab and would like to see if its possible to recover. The cluster is broken but very expected since I was testing how far I could go with breaking the cluster and still be able to recover.
Env:
Kubernetes 1.16.3
Kubespray
I was experimenting a bit and don't have any data on this cluster but I am still very curious if it's possible to recover. I have a healthy 3 node etcd cluster with the original configuration (all namespaces, workloads, configmaps etc). I don't have the original SSL certs for the control plane.
I removed all nodes from the cluster (kubeadm reset). I have original manifests and kubelet config and try to re-init master nodes. It is quite more successful than I thought it would be but not where I want it to be.
After successful kubeadm init, the kubelet and control plane containers start successfully but the corresponding pods are not created. I am able to use the kube API with kubectl and see the nodes, namespaces, deployments, etc.
In the kube-system namespace all daemonsets still exist but the pods won't start with the following message:
49m Warning FailedCreate daemonset/kube-proxy Error creating: Timeout: request did not complete within requested timeout
The kubelet logs the following re control plane pods
Jul 21 22:30:02 k8s-master-4 kubelet[13791]: E0721 22:30:02.088787 13791 kubelet.go:1664] Failed creating a mirror pod for "kube-scheduler-k8s-master-4_kube-system(3e128801ef687b022f6c8ae175c9c56d)": Timeout: request did not complete within requested timeout
Jul 21 22:30:53 k8s-master-4 kubelet[13791]: E0721 22:30:53.089517 13791 kubelet.go:1664] Failed creating a mirror pod for "kube-controller-manager-k8s-master-4_kube-system(da5cfae13814fa171a320ce0605de98f)": Timeout: request did not complete within requested timeout
During kubeadm reset/init process I already have some steps so I can get to where I am now (delete serviceaccounts to reset the tokens, delete some configmaps (kuebadm etc))
My question is - is it possible to recover the control plane without the certs. And if its complicated but still possible process I would still like to know.
All help appreciated
Henro
is it possible to recover the control plane without the certs.
Yes, should be able to. The certs 🔏 are required but they don't have to be the very same ones that you created the cluster initially with. All the certificates including the CA can be rotated across the board. The kubelet even supports certificate auto-rotation. The configurations need to match everywhere though. Meaning the CA needs to be the same that created the CSRs and cert keys/certs need to be created from the same CSRs. 🔑
Also, all the components need to use the same CA and be able to authenticate with the API server (kube-controller-manager, kube-scheduler, etc) 🔐. I'm not entirely sure about the logs that you are seeing but it looks like the kube-controller-manager and kube-scheduler are not able to authenticate and join the cluster. So I would take a look at their cert configurations:
/etc/kubernetes/kube-controller-manager.conf
/etc/kubernetes/kube-scheduler.conf
Also, you would find every PKI component that you need to verify under /etc/kubernetes/pki
✌️
I have a kubernetes cluster with some deployments and pods.I have experienced a issue with my deployments with error messages like FailedToUpdateEndpoint, RedinessprobeFailed.
This errors are unexpected and didn't have idea about it.When we analyse the logs of our, it seems like someone try hack our cluster(not sure about it).
Thing to be clear:
1.Is there any chance someone can illegally access our kubernetes cluster without having the kubeconfig?
2.Is there any chance, by using the frontend IP,access our apps and make changes in cluster configurations(means hack the cluster services via Web URL)?
3.Even if the cluster access illegally via frontend URL, is there any chance to change the configuration in cluster?
4.Is there is any mechanism to detect, whether the kubernetes cluster is healthy state or hacked by someone?
Above three mentioned are focus the point, is there any security related issues with kubernetes engine.If not
Then,
5.Still I work on this to find reason for that errors, Please provide more information on that, what may be the cause for these errors?
Error Messages:
FailedToUpdateEndpoint: Failed to update endpoint default/job-store: Operation cannot be fulfilled on endpoints "job-store": the object has been modified; please apply your changes to the latest version and try again
The same error happens for all our pods in cluster.
Readiness probe failed: Error verifying datastore: Get https://API_SERVER: context deadline exceeded; Error reaching apiserver: taking a long time to check apiserver
I'm fairly new to Kubernetes and trying to wrap my head around how to manage ComponentConfigs in already running clusters.
For example:
Recently I initialized a kubeadm cluster in a test environment running Ubuntu. When I did that, I found CoreDNS to be in a CrashLoopBackoff which turned out to be the case because Ubuntu was configured to use systemd-resolved and so the resolv.conf had a loopback resolver configured. After reading the docs for coredns, I found out that a solution for that would be to change the resolvConf parameter for kubelet - either via commandline arguments or in the config.
So how would one do this properly in a kubeadm-managed cluster?
Reading [this page in the documentation][1] I didn't really get a clue, because it seems to be tailored to the case of initializing a new cluster or joining new nodes.
Of course, in this particular situation I could just use "Kubeadm reset" and initialize it again with a --config parameter but that doesn't seem to be the right solution for a running cluster.
So after digging a bit deeper I found several infos:
I could change the /var/lib/kubelet/kubeadm-flags.env on the node directly, but AFAICT this only makes sense for node-specific changes.
There is a ConfigMap in the kube-system namespace named kubelet-config-1.14. This seems promising for upcoming nodes joining the cluster to get the right configuration - but would changing that CM affect the already running Kubelet?
There is a marshalled version of the running config in /var/lib/config/kubelet.yaml that I could change, but AFAIU this would be overriden by kubelet itself periodically (?) or at least during a kubeadm upgrade.
There seems to be an option to specify a configmap in the node object, to let kubelet dynamically load the configuration from there, but given that there is already an existing configmap it seems more sensible to change that one.
I seemingly had success by some combination of changing aforementioned CM, running kubeadm upgrade something afterwards and rebooting the machine (since restarting the kubelet did not fix the CoreDNS issue ... but maybe I was to impatient).
So I am now asking:
What is the recommended way to carry out changes to the kubelet configuration (or any other configuration I could affect via kubeadm-config.yaml) that works and is upgrade-safe for cases where the configuration is not node-specific?
And if this involves running kubeadm ... config --config - how do I extract the existing Kubeadm-config in a way that I can feed it back to to kubeadm?
I am entirely happy with pointers to the right documentation, I just didn't find the right clues myself.
TIA
What you are looking for is well described in official documentation.
The basic workflow for configuring a Kubelet is as follows:
Write a YAML or JSON configuration file containing the Kubelet’s configuration.
Wrap this file in a ConfigMap and save it to the Kubernetes control plane.
Update the Kubelet’s corresponding Node object to use this ConfigMap.
In addition there is DynamicKubeletConfig Feature Gate is enabled by default starting from Kubernetes v1.11, but you need some additional steps to activate it. You need to remember about, that Kubelet’s --dynamic-config-dir flag must be set to a writable directory on the Node.
I'm setting up a 2-node Kubernetes system, following the Docker Multi-Node instructions.
My problem is that kubectl get nodes only shows the master, not the worker node as well.
The setup appears to have worked, with all the expected containers running (as far as I know)
I've confirmed that networking works via flannel.
The subnet of the work node appears in the master's subnet list.
So everything looks good, except the node isn't showing up.
My questions:
Am I right in thinking the worker node should now be visible from 'get nodes'?
Does it matter whether the MASTER_IP used to do the setup was the master node's public IP address, or the docker IP? (I've tried both..)
Where do I start with debugging this?
Any pointers gratefully accepted...
Versions:
Ubuntu Trusty 14.04 LTS on both master and worker
Kubernetes v1.1.4
hyperkube:v1.0.3
Answering my own #cloudplatform question...
It turned out to be a problem in worker.sh in Kubernetes v1.1.4.
kubectl is called with "--hostname-override=$(hostname -i)"
On this machine, that returns the IPv6 address.
The K8s code is trying to turn that into a DNS name, and fails.
So looking at the log file for the kubectl container, we see this:
I0122 15:57:33.891577 1786 kubelet.go:1942] Recording NodeReady event message for node 2001:41c9:1:41f::131
I0122 15:57:33.891599 1786 kubelet.go:790] Attempting to register node 2001:41c9:1:41f::131
I0122 15:57:33.894076 1786 kubelet.go:793] Unable to register 2001:41c9:1:41f::131 with the apiserver: Node "2001:41c9:1:41f::131" is invalid: [metadata.name: invalid value '2001:41c9:1:41f::131': must be a DNS subdomain (at most 253 characters, matching regex [a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*): e.g. "example.com", metadata.labels: invalid value '2001:41c9:1:41f::131': must have at most 63 characters, matching regex (([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?: e.g. "MyValue" or ""]
So that's my problem. Take that out and it all works well.
So in answer to my 3 questions:
Yes, the worker node should be visible immediately in 'get nodes'.
I don't think it matters for getting it to work; it may matter for security reasons.
First step after checking that the basic networking is right and the containers are running: look at the log file for the new node's kubectl container.
Update: I wrote this blog post to explain how I got it working http://blog.willmer.org/2016/11/kubernetes-bytemark/