How to attach OpenStack volume to a Kubernetes staic pod? - kubernetes

Suppose I bootstrap a single master node with kubelet v1.10.3 in OpenStack cloud and I would like to have a "self-hosted" single etcd node for k8s necessities as a pod.
Before starting kube-apiserver component you need a working etcd instance, but of course you can't just perform kubectl apply -f or put a manifest to addon-manager folder because cluster is not ready at all.
There is a way to start pods by kubelet without having a ready apiserver. It is called static pods (yaml Pod definitions usually located at /etc/kubernetes/manifests/). And it is the way I start "system" pods like apiserver, scheduler, controller-manager and etcd itself. Previously I just mounted a directory from node to persist etcd data, but now I would like to use OpenStack blockstorage resource. And here is the question: how can I attach, mount and use OpenStack cinder volume to persist etcd data from static pod?
As I learned today there are at least 3 ways to attach OpenStack volumes:
CSI OpenStack cinder driver which is pretty much new way of managing volumes. And it won't fit my requirements, because in static pods manifests I can only declare Pods and not other resources like PVC/PV while CSI docs say:
The csi volume type does not support direct reference from Pod and may only be referenced in a Pod via a PersistentVolumeClaim object.
before-csi way to attach volumes is: FlexVolume.
FlexVolume driver binaries must be installed in a pre-defined volume plugin path on each node (and in some cases master).
Ok, I added those binaries to my node (using this DS as a reference), added volume to pod manifest like this:
volumes:
- name: test
flexVolume:
driver: "cinder.io/cinder-flex-volume-driver"
fsType: "ext4"
options:
volumeID: "$VOLUME_ID"
cinderConfig: "/etc/kubernetes/cloud-config"
and got the following error from kubelet logs:
driver-call.go:258] mount command failed, status: Failure, reason: Volume 2c21311b-7329-4cf4-8230-f3ce2f23cf1a is not available
which is weird because I am sure this Cinder volume is already attached to my CoreOS compute instance.
and the last way to mount volumes I know is cinder in-tree support which should work since at least k8s 1.5 and does not have any special requirements besides --cloud-provider=openstack and --cloud-config kubelet options.
The yaml manifest part for declaring volume for static pod looks like this:
volumes:
- name: html-volume
cinder:
# Enter the volume ID below
volumeID: "$VOLUME_ID"
fsType: ext4
Unfortunately when I try this method I get the following error from kubelet:
Volume has not been added to the list of VolumesInUse in the node's volume status for volume.
Do not know what it means but sounds like the node status could not be updated (of course, there is no etcd and apiserver yet). Sad, it was the most promising option for me.
Are there any other ways to attach OpenStack cinder volume to a static pod relying on kubelet only (when cluster is actually not ready)? Any ideas on what cloud I miss of got above errors?

Message Volume has not been added to the list of VolumesInUse in the node's volume status for volume. says that attach/detach operations for that node are delegated to controller-manager only. Kubelet waits for attachment being made by controller but volume doesn't reach appropriate state because controller isn't up yet.
The solution is to set kubelet flag --enable-controller-attach-detach=false to let kubelet attach, mount and so on. This flag is set to true by default because of the following reasons
If a node is lost, volumes that were attached to it can be detached
by the controller and reattached elsewhere.
Credentials for attaching and detaching do not need to be made
present on every node, improving security.
In your case setting of this flag to false is reasonable as this is the only way to achieve what you want.

Related

Kubernetes - All PVCs Bound, yet "pod has unbound immediate PersistentVolumeClaims"

Unfortunately I am unable to paste configs or kubectl output, but please bear with me.
Using helm to deploy a series of containers to K8s 1.14.6, all containers are deploying successfully except for those that have initContainer sections defined within them.
In these failing deployments, their templates define container and initContainer stanzas that reference the same persistent-volume (and associated persistent-volume-claim, both defined elsewhere).
The purpose of the initContainer is to copy persisted files from a mounted drive location into the appropriate place before the main container is established.
Other containers (without initContainer stanzas) mount properly and run as expected.
These pods which have initContainer stanzas, however, report "failed to initialize" or "CrashLoopBackOff" as they continually try to start up. The kubectl describe pod of these pods gives only a Warning in the events section that "pod has unbound immediate PersistentVolumeClaims." The initContainer section of the pod description says it has failed because "Error" with no further elaboration.
When looking at the associated pv and pvc entries from kubectl, however, none are left pending, and all report "Bound" with no Events to speak of in the description.
I have been able to find plenty of articles suggesting fixes when your pvc list shows Pending claims, yet none so far that address this particular set of circumstance when all pvcs are bound.
When a PVC is "Bound", this means that you do have a PersistentVolume object in your cluster, whose claimRef refers to that PVC (and usually: that your storage provisioner is done creating the corresponding volume in your storage backend).
When a volume is "not bound", in one of your Pod, this means the node where your Pod was scheduled is unable to attach your persistent volume. If you're sure there's no mistake in your Pods volumes, you should then check logs for your csi volumes attacher pod, when using CSI, or directly in nodes logs when using some in-tree driver.
While the crashLoopBackOff thing is something else. You should check for logs of your initContainer: kubectl logs -c <init-container-name> -p. From your explanation, I would suppose there's some permission issues when copying files over.

NFS mount fails during kubernetes pod initialization if NFS client is set on container level, not on pod level

Setup:
Linux VM where Pod (containing 3 containers) is started.
Only 1 of the containers needs the NFS mount to the remote NFS server.
This "app" container is based Alpine linux.
Remote NFS server is up & running. If I create a separate yaml file for persistent volume with that server info - it's up & available.
In my pod yaml file I define Persistent Volume (with that remote NFS server info), Persistent Volume Claim and associate my "app" container's volume with that claim.
Everything works as a charm if on the hosting linux VM I install the NFS library, like:
sudo apt install nfs-common.
(That's why I don't share my kubernetes yaml file. Looks like problem is not there.)
But that's a development environment. I'm not sure how/where those containers would be used in production. For example they would be used in AWS EKS.
I hoped to install something like
apk add --no-cache nfs-utils in the "app" container's Dockerfile.
I.e. on container level, not on a pod level - could it work?
So far getting the pod initialization error:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 35s default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling 22s default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Normal Scheduled 20s default-scheduler Successfully assigned default/delphix-masking-0 to masking-kubernetes
Warning FailedMount 4s (x6 over 20s) kubelet MountVolume.SetUp failed for volume "nfs-pv" : mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t nfs -o hard,nfsvers=4.1 maxTestNfs1.dlpxdc.co:/var/tmp/masking-mount /var/snap/microk8s/common/var/lib/kubelet/pods/2e6b7aeb-5d0d-4002-abba-88de032c12dc/volumes/kubernetes.io~nfs/nfs-pv
Output: mount: /var/snap/microk8s/common/var/lib/kubelet/pods/2e6b7aeb-5d0d-4002-abba-88de032c12dc/volumes/kubernetes.io~nfs/nfs-pv: bad option; for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount.<type> helper program.
And the process is stuck in that step forever.
Looks like it happens before even trying to initialize containers.
So I wonder if approach of enabling NFS-client on the container's level is valid.
Thanks in ahead for any insights!
I hoped to install something like apk add --no-cache nfs-utils in the "app" container's Dockerfile. I.e. on container level, not on a pod level - could it work?
Yes, this could work. This is normally what you would do if you have no control to the node (eg. you can't be sure if the host is ready for NFS calls). You need to ensure your pod can reach out to the NFS server and in between all required ports are opened. You also needs to ensure required NFS program (eg. rpcbind) is started before your own program in the container.
...For example they would be used in AWS EKS.
EKS optimized AMI come with NFS supports, you can leverage K8S PV/PVC support using this image for your worker node, there's no need to initialize NFS client support in your container.
TL;DR - no, it is not best practice (and not the right way) to mount NFS volumes from inside the container. There is no use-case for it. And it is a huge security risk as well (allowing applications direct access to cluster-wide storage without any security control).
It appears your objective is to provide storage to your container that is backed by NFS, right? Then doing it by creating a PersistentVolume and then using a PersistentVolumeClaim to attach it to your Pod is the correct approach. That you're already doing.
No, you don't have to worry about how will the storage be provided to the container that is because due to the way k8s runs applications, certain conditions MUST be met before a Pod can be scheduled on a Node. One of those conditions is, the volumes the pod is mounting MUST be available. If a volume doesnt exist, and you mount it in a Pod, that Pod will never get scheduled and will possibly be stuck in Pending state. That's what you see as the error as well.
You don't have to worry about the NFS connectivity, because in this case, the Kubernetes PersistentVolume resource you created will technically act like an NFS client for your NFS server. This provides a uniform storage interface (applications don't have to care where the volume is coming from, the application code will be independent of storage type) as well as better security and permission control.
Another note, when dealing with Kubernetes, it is recommended to consider Pod as your 'smallest unit' of infrastructure and not container. So, it is the recommended way to only use 1 application container per pod for simplicity of design and achieving the micro-service architecture in true sense.

Kubernetes: hostPath Static Storage with PV vs hard coded hostPath in Pod Volume

I'm learning Kubernetes and there is something I don't get well.
There are 3 ways of setting up static storage:
Pods with volumes you attach diretctly the storage to
Pods with a PVC attached to its volume
StatefulSets with also PVC inside
I can understand the power of PVC when working together with StorageClass, but not when working with static storage and local storage like hostPath
To me, it sounds very similar:
In the first case I have a volume directly attached to a pod.
In the second case I have a volume statically attached to a PVC, which is also manually attached to a Pod. In the end, the volume will be statically attached to the Pod.
On both cases, the data will remain when the Pod is terminates and will be adopted by the next Pod which the corresponing definition, right?
The only profit I see from using PVCs over plain Pod is that you can define the acces mode. Apart of that. Is there a difference when working with hostpath?
On the other hand, the advantage of using a StatefulSet instead of a PVC is (if understood properly) that it get a headless service, and that the rollout and rollback mechanism works differently. Is that the point?
Thank you in advance!
Extracted from this blog:
The biggest difference is that the Kubernetes scheduler understands
which node a Local Persistent Volume belongs to. With HostPath
volumes, a pod referencing a HostPath volume may be moved by the
scheduler to a different node resulting in data loss. But with Local
Persistent Volumes, the Kubernetes scheduler ensures that a pod using
a Local Persistent Volume is always scheduled to the same node.
Using hostPath does not garantee that a pod will restart on the same node. So you pod can attach /tmp/storage on k8s-node-1, then if you delete and re-create the pod, it may attach tmp/storage on k8s-node-[2-n]
On the contrary, if you use PVC/PV with local persistent storage class, then if you delete and re-create a pod, it will stick on the node which handle the local persistent storage.
StatefulSet creates pods and has volumeClaimTemplate field, which creates a dedicated PVC for each pod. So each pod created by the statefulSet will have its own dedicated storage, linked with Pod->PVC->PV->Storage. So StatefulSet use also the PVC/PV mechanism.
More details are available here.

Kubernetes Persistent Volume Claim FileSystemResizePending

i have a persistent volume claim for a kubernetes pod which shows the message "Waiting for user to (re-)start a pod to finish file system resize of volume on node." if i check it with 'kubectl describe pvc ...'
The rezising itself worked which was done with terraform in our deployments but this message still shows up here and i'm not really sure how to get this fixed? The pod was already restarted several times - i tried kubectl delete pod and scale it down with kubectl scale deployment.
Does anyone have an idea how to get rid of this message?screenshot
There are few things to consider:
Instead of using the Terraform, try resizing the PVC by editing it manually. After that wait for the underlying volume to be expanded by the storage provider and verify if the FileSystemResizePending condition is present by executing kubectl get pvc <pvc_name> -o yaml. Than, make sure that all the associated pods are restarted so the whole process can be completed. Once file system resizing is done, the PVC will automatically be updated to reflect new size.
Make sure that your volume type is supported for expansion. You can expand the following types of volumes:
gcePersistentDisk
awsElasticBlockStore
Cinder
glusterfs
rbd
Azure File
Azure Disk
Portworx
FlexVolumes
CSI
Check if in your StorageClass the allowVolumeExpansion field is set to true.

Kubernetes local persistent volume cannot mount an existing path [duplicate]

I try, I try, but Rancher 2.1 fails to deploy the "mongo-replicaset" Catalog App, with Local Persistent Volumes configured.
How to correctly deploy a mongo-replicaset with Local Storage Volume? Any debugging techniques appreciated since I am new to rancher 2.
I follow the 4 ABCD steps bellow, but the first pod deployment never ends. What's wrong in it? Logs and result screens are at the end. Detailed configuration can be found here.
Note: Deployment without Local Persistent Volumes succeed.
Note: Deployment with Local Persistent Volume and with the "mongo" image succeed (without replicaset version).
Note: Deployment with both mongo-replicaset and with Local Persistent Volume fails.
Step A - Cluster
Create a rancher instance, and:
Add three nodes: a worker, a worker etcd, a worker control plane
Add a label on each node: name one, name two and name three for node Affinity
Step B - Storage class
Create a storage class with these parameters:
volumeBindingMode : WaitForFirstConsumer saw here
name : local-storage
Step C - Persistent Volumes
Add 3 persistent volumes like this:
type : local node path
Access Mode: Single Node RW, 12Gi
storage class: local-storage
Node Affinity: name one (two for second volume, three for third volume)
Step D - Mongo-replicaset Deployment
From catalog, select Mongo-replicaset and configure it like that:
replicaSetName: rs0
persistentVolume.enabled: true
persistentVolume.size: 12Gi
persistentVolume.storageClass: local-storage
Result
After doing ABCD steps, the newly created mongo-replicaset app stay infinitely in "Initializing" state.
The associated mongo workload contain only one pod, instead of three. And this pod has two 'crashed' containers, bootstrap and mongo-replicaset.
Logs
This is the output from the 4 containers of the only running pod. There is no error, no problem.
I can't figure out what's wrong with this configuration, and I don't have any tools or techniques to analyze the problem. Detailed configuration can be found here. Please ask me for more commands results.
Thanks you
All this configuration is correct.
It's missing a detail since Rancher is a containerized deployment of kubernetes.
Kubelets are deployed on each node in docker containers. They don't access to OS local folders.
It's needed to add a volume binding for the kubelets, like that K8s will be able to create the mongo pod with this same binding.
In rancher:
Edit the cluster yaml (Cluster > Edit > Edit as Yaml)
Add the following entry under "services" node:
kubelet:
extra_binds:
- "/mongo:/mongo:rshared"