Recommended way to provide kubernetes metrics-server with its key pair? - kubernetes

Are there any known issues with metrics-server and configmap? I’ve tried a zillion things to get it to work but unable to. If in my deployment manifest I simply replace "image: k8s.gcr.io/metrics-server-amd64:v0.3.3" with “image: docker.io/alpine” it can read configmap files. But metrics-server throws the following error:
“no such file or directory” when attempting to reference a configmap file. Which tends to make me suspect the problem is in metrics-server rather than the k8s environment.
My purpose is doing this is to make the server’s public and private keys (–tls-cert-file) available to the container. If a configmap is not the recommended way to provide the metrics-server its keys , please let me know what the recommended way is. (In tihs case I still would be curious why metrics-server cannot mount configmap volumes.)

I figured this out. The problem was a combination of a misleading error message from metric-server and zero insight into whether or not the container was able to see the files in the volume. In fact the files were there, but the error message made me think they weren’t. If you pass “–tls-cert-file” without also giving “–tls-private-key-file” (which I was doing just for testing) the error message is: “No such file or directory”. Instead of something more informative, like “Please specify both options together.” The metrics-server developers need to change this and save “No such file” for cases when the file actually does not exist or cannot be opened for reading.
Thinking there was no file, there wasn’t any way to verify this from within the container because it only has one binary without any shell. Running “docker export” on the non-running container (not running because metrics-server would bomb out with the error) revealed an empty volume because kubelet had unmounted the volumes when stopping the container.
Looking at the kubelet logs they were showing everything ok with the volume, and I could see the files under /var/lib/kublet/pods/…/. But all indications were that something was wrong because I had no insight into what the container itself was seeing.
Once I started passing both the command line options for the certs, everything was working.

Related

Launching containers in Kubernetes after converting the config from docker compose using kompose crashes with a "not a directory" error

Help a novice kuberneter, please.
I'm trying to reconfigure the docker compose file into configs for kubernetes using the kompose tool, according to the official instructions: https://kubernetes.io/docs/tasks/configure-pod-container/translate-compose-kubernetes/
The translation of the config itself is performed without problems, but when I try to start containers using kubectl apply, the containers fall with the following error:
I understand the meaning of the error, the mount cannot be performed, since this is a file, not a directory. But I can't figure it out why? As far as I know, the kubernetes config allows you to mount files. Does kompose make a corrupted conversion or what?
In the original docker-compose.yml file, the problem area looks like this:
In prometheus-deployment.yaml, after converting with kompose, the following is obtained:
There is an assumption that kubernetes is trying to mount prometheus.yml as a persistentVolume type and this is the problem, but this is only my assumption.
Can you please suggest me what I'm doing wrong and what needs to be fixed?

Config DBConfig.ExtraParams not specified for ml-pipeline pod

I have installed Kubeflow using manifest. After installing ml-pipeline, the pod is in "CrashLoopBackOff" state. I changed the destinationrule for ml-pipeline, ml-pipeline-ui and ml-pipeline-msql to DISABLE but no luck. Can anyone help with this?
Thanks in advance.
There are a bunch of possible root causes for this POD’s status, but I am going to try to focus on the most common ones. To choose the correct one for your accurate situation, you are going to need to take a look into the “describe” and the log from the POD with "CrashLoopBackOff" state.
Verify if the “describe” says something like “Back-off restarting failed container” and the log says something like “a container name must be specified for …”, “F ml_metadata/metadata_store/metadata_store_server_main.cc:219] Non-OK-status …”.
If yes, the problem is the dynamic volume provisioning regularly, maybe because no volume provisioner is installed.
On the other hand, you can verify your cluster’s size, because anything less than 8 CPUs is going to run only if you reduce each service’s requested cpu in the manifest files.
You do not give details on the affected POD yet; but another option is to try to install Katib only (without Kubeflow or other resources) on your K8s cluster to verify other Kubernetes resources do not affect this connection. You can use the following URL’s information for more empirical cases’ troubleshooting and solutions: Multiple Pods stuck in CrashLoopBackOff, katib-mysql , ml-pipeline-persistenceagent pod keeps crashing.
Finally just confirm that you followed the correct instructions, based on the Distribution you used to deploy Kubeflow, you can visit the following URL: Kubeflow Distributions

One Traefik Pod in Kubernetes fails with error: 'command traefik error: field not found, node: redirect'

I'm running Traefik on a Kubernetes cluster to manage Ingress, which has been running ok for a long time.
I recently implemented Cluster-Autoscaling, which works fine except that on one Node (newly created by the Autoscaler) Traefik won't start. It sits in CrashLoopBackoff, and when I log the Pod I get: [date] [time] command traefik error: field not found, node: redirect.
Google found no relevant results, and the error itself is not very descriptive, so I'm not sure where to look.
My best guess is that it has something to do with the RedirectRegex Middleware configured in Traefik's config file:
[entryPoints.http.redirect]
regex = "^http://(.+)(:80)?/(.*)"
replacement = "https://$1/$3"
Traefik actually works still - I can still access all of my apps from their urls in my browser, even those which are on the node with the dead Traefik Pod.
The other Traefik Pods on other Nodes still run happily, and the Nodes are (at least in theory) identical.
After further googling, I found this on Reddit. Turns out Traefik updated a few days ago to v2.0, which is not backwards compatible.
Only this pod had the issue, because it was the only one for which a new (v2.0) image was pulled (being the only recently created Node).
I reverted to v1.7 until I have time to fix it properly. Had update the Daemonset to use v1.7, then kill the Pod so it could be recreated from the old image.
The devs have a Migration Guide that looks like it may help.
"redirect" is gone but now there is "RedirectScheme" and "RedirectRegex" as a new concept of "Middlewares".
It looks like they are moving to a pipeline approach, so you can define a chain of "middlewares" to apply to an "entrypoint" to decide how to direct it and what to add/remove/modify on packets in that chain. "backends" are now "providers", and they have a clearer, modular concept of configuration. It looks like it will offer better organization than earlier versions.

Google Cloud Kubernetes Persistent Volume Claim error in deployment Yaml

I have a persistent volume claim file, which previously was being read by buildkite in the deployment stage. Only recently it has been erroring in the build process with this error:
error: error validating "kube/common/01-redis-volume-claim.yml": error validating data: field
spec.dataSource for v1.PersistentVolumeClaimSpec is required; if you choose to ignore these
errors, turn validation off with --validate=false
I've seen this issue crop up twice recently, and the immediate fix is to add the missing field (spec.dataSource) and setting it to null.
My question is, if it was absent in the first instance, then will setting it to null be any different than what it was previously?
Based on documentation
spec.dataSource should have:
name: existing-src-pvc-name
kind: PersistentVolumeClaim
In my opinion everything you should do is add name and kind to your yaml file and there should not be any error anymore.
My question is, if it was absent in the first instance, then will setting it to null be any different than what it was previously?
Answering to this question, as far as I am concerned it is happening because you are not creating a new pvc, but you are likely to cloning it.
Volume clone feature was added to support CSI Volume Plugins only. For details, see volume cloning.
The CSI Volume Cloning feature adds support for specifying existing PVCs in the dataSource field to indicate a user would like to clone a Volume.

How to change kubelet configuration via kubeadm

I'm fairly new to Kubernetes and trying to wrap my head around how to manage ComponentConfigs in already running clusters.
For example:
Recently I initialized a kubeadm cluster in a test environment running Ubuntu. When I did that, I found CoreDNS to be in a CrashLoopBackoff which turned out to be the case because Ubuntu was configured to use systemd-resolved and so the resolv.conf had a loopback resolver configured. After reading the docs for coredns, I found out that a solution for that would be to change the resolvConf parameter for kubelet - either via commandline arguments or in the config.
So how would one do this properly in a kubeadm-managed cluster?
Reading [this page in the documentation][1] I didn't really get a clue, because it seems to be tailored to the case of initializing a new cluster or joining new nodes.
Of course, in this particular situation I could just use "Kubeadm reset" and initialize it again with a --config parameter but that doesn't seem to be the right solution for a running cluster.
So after digging a bit deeper I found several infos:
I could change the /var/lib/kubelet/kubeadm-flags.env on the node directly, but AFAICT this only makes sense for node-specific changes.
There is a ConfigMap in the kube-system namespace named kubelet-config-1.14. This seems promising for upcoming nodes joining the cluster to get the right configuration - but would changing that CM affect the already running Kubelet?
There is a marshalled version of the running config in /var/lib/config/kubelet.yaml that I could change, but AFAIU this would be overriden by kubelet itself periodically (?) or at least during a kubeadm upgrade.
There seems to be an option to specify a configmap in the node object, to let kubelet dynamically load the configuration from there, but given that there is already an existing configmap it seems more sensible to change that one.
I seemingly had success by some combination of changing aforementioned CM, running kubeadm upgrade something afterwards and rebooting the machine (since restarting the kubelet did not fix the CoreDNS issue ... but maybe I was to impatient).
So I am now asking:
What is the recommended way to carry out changes to the kubelet configuration (or any other configuration I could affect via kubeadm-config.yaml) that works and is upgrade-safe for cases where the configuration is not node-specific?
And if this involves running kubeadm ... config --config - how do I extract the existing Kubeadm-config in a way that I can feed it back to to kubeadm?
I am entirely happy with pointers to the right documentation, I just didn't find the right clues myself.
TIA
What you are looking for is well described in official documentation.
The basic workflow for configuring a Kubelet is as follows:
Write a YAML or JSON configuration file containing the Kubelet’s configuration.
Wrap this file in a ConfigMap and save it to the Kubernetes control plane.
Update the Kubelet’s corresponding Node object to use this ConfigMap.
In addition there is DynamicKubeletConfig Feature Gate is enabled by default starting from Kubernetes v1.11, but you need some additional steps to activate it. You need to remember about, that Kubelet’s --dynamic-config-dir flag must be set to a writable directory on the Node.