Mount propagation in kubernetes - kubernetes

I am using Mount propagation feature of Kubernetes to check the health of mount points of certain type. I create a daemonset and run a script which would do a simple ls on these mount points. I noticed that new mount points are not getting listed from the pods. Is this the expected behaviour.
volumeMounts:
- mountPath: /host
name: host-kubelet
mountPropagation: HostToContainer
volumes:
- name: host-kubelet
hostPath:
path: /var/lib/kubelet
Related Issue : hostPath containing mounts do not update as they change on the host #44713

In brief, Mount propagation allows sharing volumes mounted by a Container to other Containers in the same Pod, or even to other Pods on the same node.
Mount propagation of a volume is controlled by mountPropagation field in Container.volumeMounts. Its values are:
HostToContainer - one way propagation, from host to container. If you
mount anything inside the volume, the Container will see it there.
Bidirectional - In addition to propagation from host to container,
all volume mounts created by the Container will be propagated back to
the host, so all Containers of all Pods that use the same volume will
see it as well.
Based on documentation the Mount propagation feature is in alpha state for clusters v1.9, and going to be beta on v1.10
I've reproduced your case on kubernetes v1.9.2 and found that it completely ignores MountPropagation configuration parameter. If you try to check current state of the DaemonSet or Deployment, you'll see that this option is missed from the listed yaml configuration
$ kubectl get daemonset --export -o yaml
If you try to run just docker container with mount propagation option you may see it is working as expected:
docker run -d -it -v /tmp/mnt:/tmp/mnt:rshared ubuntu
Comparing docker container configuration with kubernetes pod container in the volume mount section, you may see that the last flag (shared/rshared) is missing in kubernetes container.
And that's why it happens in Google kubernetes clusters and may happen to clusters managed by other providers:
To ensure stability and production quality, normal Kubernetes Engine
clusters only enable features that are beta or higher. Alpha features
are not enabled on normal clusters because they are not
production-ready or upgradeable.
Since Kubernetes Engine automatically upgrades the Kubernetes control
plane, enabling alpha features in production could jeopardize the
reliability of the cluster if there are breaking changes in a new
version.
Alpha level features availability: committed to main kubernetes repo; appears in an official release; feature is disabled by default, but may be enabled by flag (in case you are able to set flags)
Before mount propagation can work properly on some deployments (CoreOS, RedHat/Centos, Ubuntu) mount share must be configured correctly in Docker as shown below.
Edit your Docker’s systemd service file. Set MountFlags as follows:
MountFlags=shared
Or, remove MountFlags=slave if present. Then restart the Docker daemon:
$ sudo systemctl daemon-reload
$ sudo systemctl restart docker

Related

Where to find the Kubernetes Scheduler Configuration file in local system

I am currently working in Minikube cluster and looking to change some flags of kubernetes scheduler configuration, but I can't find it. The file looks something like-
apiVersion: kubescheduler.config.k8s.io/v1alpha1
kind: KubeSchedulerConfiguration
algorithmSource:
provider: DefaultProvider
...
disablePreemption: true
What is it's name and where can I find it?
Posting this answer as a community wiki to set a baseline and to provide additional resources/references rather than giving a definitive solution.
Feel free to edit and expand.
I haven't found the file that you are referencing (KubeSchedulerConfiguration) in minikube.
The minikube provisioning process does not create it nor references it in the configuration files (/etc/kubernetes/manifests/kube-scheduler.yaml and the --config=PATH parameter).
I'd reckon you could take a look on other Kubernetes solutions where you can configure how your cluster is created (how kube-scheduler is configured). Some of the options are:
Kubernetes.io: Docs: Setup: Production environment: Tools: Kubeadm: Create cluster and also:
Kubernetes.io: Docs: Setup: Production environment: Tools: Kubeadm: Control plane flags
Github.com: Kubernetes sigs: Kubespray
A side note!
Both: kubespray and minikube are using kubeadm as a bootstrapper!
I would also consider creating additional scheduler that would be responsible for spawning your workload (by referencing in the YAML manifests):
Kubernetes.io: Docs: Tasks: Extend Kubernetes: Configure multiple schedulers
I haven't tested it extensively and in the long term but I've managed to include the YAML manifest that you are referencing for the kube-scheduler.
Disclaimers!
Please consider below example as a workaround!
The method described below is not persistent.
Steps:
Start your minikube instance with the --extra-config
Connect to your minikube instance and edit/add files:
/etc/kubernetes/manifests/kube-scheduler.yaml
newly created KubeSchedulerConfiguration
Delete the failing kube-scheduler Pod and wait for it to be recreated.
Start your minikube instance with the --extra-config
As previously said you can add some additional parameters for your $ minikube start to be passed down to the provisioning process.
In this setup you can either pass it with $ minikube start ... or do it manually later on.
$ minikube start --extra-config=scheduler.config="/etc/kubernetes/sched.yaml"
Above parameter will add the - --config=/etc/kubernetes/sched.yaml to the command of your kube-scheduler. It will look for the file in the mentioned location.
Connect to your minikube instance ($ minikube ssh) and edit/add files:
Your kube-scheduler will fail as you've passed an argument (config) that is incorrect (lack of file). To work around this you will need to:
add: /etc/kubernetes/sched.yaml with your desired configuration
modify: /etc/kubernetes/manifests/kube-scheduler.yaml:
add to: volumeMounts:
- mountPath: /etc/kubernetes/sched.yaml
name: scheduler
readOnly: true
add to volumes:
- hostPath:
path: /etc/kubernetes/sched.conf
type: FileOrCreate
name: scheduler
Delete the failing kube-scheduler Pod and wait for it to be recreated.
You will need to redeploy modified scheduler to get its new config running:
$ kubectl delete pod -n kube-system kube-scheduler-minikube
After some time you should see your kube-scheduler in Ready state.
Additional resources:
Kubernetes.io: Docs: Concepts: Scheduling eviction: Kube-scheduler
Kubernetes.io: Docs: Reference: Command line tools reference: Kube-scheduler

Kubernetes Alpha Debug command

I'm trying the debug CLI feature on 1.18 release of Kubernetes but I have an issue while I have executed debug command.
I have created a pod show as below.
kubectl run ephemeral-demo --image=k8s.gcr.io/pause:3.1 --restart=Never
After than when running this command: kubectl alpha debug -it ephemeral-demo --image=busybox --target=ephemeral-demo
Kubernetes is hanging like that :
Defaulting debug container name to debugger-aaaa.
How Can I resolve that issue?
It seems that you need to enable the feature gate on all control plane components as well as on the kubelets. If the feature is enabled partially (for instance, only kube-apiserver and kube-scheduler), the resources will be created in the cluster, but no containers will be created, thus there will be nothing to attach to.
In addition to the answer posted by Konstl
To enable EphemeralContainers featureGate correctly, add on the master nodes to:
/etc/kubernetes/manifests/kube-apiserver.yaml
/etc/kubernetes/manifests/kube-controller-manager.yaml
/etc/kubernetes/manifests/kube-scheduler.yaml
the following line to container command:
spec:
containers:
- command:
- kube-apiserver # or kube-controller-manager/kube-scheduler
- --feature-gates=EphemeralContainers=true # < -- add this line
Pods will restart immediately.
For enabling the featureGate for kubelet add on all nodes to:
/var/lib/kubelet/config.yaml
the following lines at the bottom:
featureGates:
EphemeralContainers: true
Save the file and run the following command:
$ systemctl restart kubelet
This was enough in my case to be able to use kubectl alpha debug as it's explained in the documentation
Additional usefull pages:
Ephemeral Containers
Share Process Namespace between Containers in a Pod
I'm seeing similar behaviour. Although i'm running 1.17 k8s with 1.18 kubectl. But I was under the impression that the feature was added in 1.16 in k8s and in 1.18 in kubectl.

Kubernetes local persistent volume cannot mount an existing path [duplicate]

I try, I try, but Rancher 2.1 fails to deploy the "mongo-replicaset" Catalog App, with Local Persistent Volumes configured.
How to correctly deploy a mongo-replicaset with Local Storage Volume? Any debugging techniques appreciated since I am new to rancher 2.
I follow the 4 ABCD steps bellow, but the first pod deployment never ends. What's wrong in it? Logs and result screens are at the end. Detailed configuration can be found here.
Note: Deployment without Local Persistent Volumes succeed.
Note: Deployment with Local Persistent Volume and with the "mongo" image succeed (without replicaset version).
Note: Deployment with both mongo-replicaset and with Local Persistent Volume fails.
Step A - Cluster
Create a rancher instance, and:
Add three nodes: a worker, a worker etcd, a worker control plane
Add a label on each node: name one, name two and name three for node Affinity
Step B - Storage class
Create a storage class with these parameters:
volumeBindingMode : WaitForFirstConsumer saw here
name : local-storage
Step C - Persistent Volumes
Add 3 persistent volumes like this:
type : local node path
Access Mode: Single Node RW, 12Gi
storage class: local-storage
Node Affinity: name one (two for second volume, three for third volume)
Step D - Mongo-replicaset Deployment
From catalog, select Mongo-replicaset and configure it like that:
replicaSetName: rs0
persistentVolume.enabled: true
persistentVolume.size: 12Gi
persistentVolume.storageClass: local-storage
Result
After doing ABCD steps, the newly created mongo-replicaset app stay infinitely in "Initializing" state.
The associated mongo workload contain only one pod, instead of three. And this pod has two 'crashed' containers, bootstrap and mongo-replicaset.
Logs
This is the output from the 4 containers of the only running pod. There is no error, no problem.
I can't figure out what's wrong with this configuration, and I don't have any tools or techniques to analyze the problem. Detailed configuration can be found here. Please ask me for more commands results.
Thanks you
All this configuration is correct.
It's missing a detail since Rancher is a containerized deployment of kubernetes.
Kubelets are deployed on each node in docker containers. They don't access to OS local folders.
It's needed to add a volume binding for the kubelets, like that K8s will be able to create the mongo pod with this same binding.
In rancher:
Edit the cluster yaml (Cluster > Edit > Edit as Yaml)
Add the following entry under "services" node:
kubelet:
extra_binds:
- "/mongo:/mongo:rshared"

How to attach OpenStack volume to a Kubernetes staic pod?

Suppose I bootstrap a single master node with kubelet v1.10.3 in OpenStack cloud and I would like to have a "self-hosted" single etcd node for k8s necessities as a pod.
Before starting kube-apiserver component you need a working etcd instance, but of course you can't just perform kubectl apply -f or put a manifest to addon-manager folder because cluster is not ready at all.
There is a way to start pods by kubelet without having a ready apiserver. It is called static pods (yaml Pod definitions usually located at /etc/kubernetes/manifests/). And it is the way I start "system" pods like apiserver, scheduler, controller-manager and etcd itself. Previously I just mounted a directory from node to persist etcd data, but now I would like to use OpenStack blockstorage resource. And here is the question: how can I attach, mount and use OpenStack cinder volume to persist etcd data from static pod?
As I learned today there are at least 3 ways to attach OpenStack volumes:
CSI OpenStack cinder driver which is pretty much new way of managing volumes. And it won't fit my requirements, because in static pods manifests I can only declare Pods and not other resources like PVC/PV while CSI docs say:
The csi volume type does not support direct reference from Pod and may only be referenced in a Pod via a PersistentVolumeClaim object.
before-csi way to attach volumes is: FlexVolume.
FlexVolume driver binaries must be installed in a pre-defined volume plugin path on each node (and in some cases master).
Ok, I added those binaries to my node (using this DS as a reference), added volume to pod manifest like this:
volumes:
- name: test
flexVolume:
driver: "cinder.io/cinder-flex-volume-driver"
fsType: "ext4"
options:
volumeID: "$VOLUME_ID"
cinderConfig: "/etc/kubernetes/cloud-config"
and got the following error from kubelet logs:
driver-call.go:258] mount command failed, status: Failure, reason: Volume 2c21311b-7329-4cf4-8230-f3ce2f23cf1a is not available
which is weird because I am sure this Cinder volume is already attached to my CoreOS compute instance.
and the last way to mount volumes I know is cinder in-tree support which should work since at least k8s 1.5 and does not have any special requirements besides --cloud-provider=openstack and --cloud-config kubelet options.
The yaml manifest part for declaring volume for static pod looks like this:
volumes:
- name: html-volume
cinder:
# Enter the volume ID below
volumeID: "$VOLUME_ID"
fsType: ext4
Unfortunately when I try this method I get the following error from kubelet:
Volume has not been added to the list of VolumesInUse in the node's volume status for volume.
Do not know what it means but sounds like the node status could not be updated (of course, there is no etcd and apiserver yet). Sad, it was the most promising option for me.
Are there any other ways to attach OpenStack cinder volume to a static pod relying on kubelet only (when cluster is actually not ready)? Any ideas on what cloud I miss of got above errors?
Message Volume has not been added to the list of VolumesInUse in the node's volume status for volume. says that attach/detach operations for that node are delegated to controller-manager only. Kubelet waits for attachment being made by controller but volume doesn't reach appropriate state because controller isn't up yet.
The solution is to set kubelet flag --enable-controller-attach-detach=false to let kubelet attach, mount and so on. This flag is set to true by default because of the following reasons
If a node is lost, volumes that were attached to it can be detached
by the controller and reattached elsewhere.
Credentials for attaching and detaching do not need to be made
present on every node, improving security.
In your case setting of this flag to false is reasonable as this is the only way to achieve what you want.

kubernetes allow privileged local testing cluster

I'm busy testing out kubernetes on my local pc using https://github.com/kubernetes/kubernetes/blob/master/docs/getting-started-guides/docker.md
which launches a dockerized single node k8s cluster. I need to run a privileged container inside k8s (it runs docker in order to build images from dockerfiles). What I've done so far is add a security context privileged=true to the pod config which returns forbidden when trying to create the pod. I know that you have to enable privileged on the node with --allow-privileged=true and I've done this by adding the parameter arg to step two (running the master and worker node) but it still returns forbidden when creating the pod.
Anyone know how to enable privileged in this dockerized k8s for testing?
Here is how I run the k8s master:
docker run --privileged --net=host -d -v /var/run/docker.sock:/var/run/docker.sock gcr.io/google_containers/hyperkube:v1.0.1 /hyperkube kubelet --api-servers=http://localhost:8080 --v=2 --address=0.0.0.0 --allow-privileged=true --enable-server --hostname-override=127.0.0.1 --config=/etc/kubernetes/manifests
Update: Privileged mode is now enabled by default (both in the apiserver and in the kubelet) starting with the 1.1 release of Kubernetes.
To enable privileged containers, you need to pass the --allow-privileged flag to the Kubernetes apiserver in addition to the Kubelet when it starts up. The manifest file that you use to launch the Kubernetes apiserver in the single node docker example is bundled into the image (from master.json), but you can make a local copy of that file, add the --allow-privileged=true flag to the apiserver command line, and then change the --config flag you pass to the Kubelet in Step Two to a directory containing your modified file.