SparkApplication - Persistent Volume for NFS - kubernetes

I have some issues using Volume for sparkApplication. I'm describing the use of persistentVolumeClaim in the yaml file for SparkApplication as described in the documentation: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/user-guide.md#mounting-volumes
However, the directory that is to be mapped to a directory on NFS is empty. No errors occur when running sparkApplication. When starting a regular pod, everything works correctly and the directory is mapped to the NFS directory.
The code for the yaml files is given in this repository: https://github.com/darkeugene/spark-on-k8s-error
Could you tell me, please, what the problem could be?
Thank you!

Related

How to mount a single file from the local Kubernetes cluster into the pods

I set up a local Kubernetes cluster using Kind, and then I run Apache-Airflow on it using Helm.
To actually create the pods and run Airflow, I use the command:
helm upgrade -f k8s/values.yaml airflow bitnami/airflow
which uses the chart airflow from the bitnami/airflow repo, and "feeds" it with the configuration of values.yaml.
The file values.yaml looks something like:
web:
extraVolumeMounts:
- name: functions
mountPath: /dir/functions/
extraVolumes:
- name: functions
hostPath:
path: /dir/functions/
type: Directory
where web is one component of Airflow (and one of the pods on my setup), and the directory /dir/functions/ is successfully mapped from the cluster inside the pod. However, I fail to do the same for a single, specific file, instead of a whole directory.
Does anyone knows the syntax for that? Or have an idea for an alternative way to map the file into the pod (its whole directory is successfully mapped into the cluster)?
There is a File type for hostPath which should behave like you desire, as it states in the docs:
File: A file must exist at the given path
which you can then use with the precise file path in mountPath. Example:
web:
extraVolumeMounts:
- name: singlefile
mountPath: /path/to/mount/the/file.txt
extraVolumes:
- name: singlefile
hostPath:
path: /path/on/the/host/to/the/file.txt
type: File
Or if it's not a problem, you could mount the whole directory containing it at the expected path.
With this said, I want to point out that using hostPath is (almost always) never a good idea.
If you have a cluster with more than one node, saying that your Pod is mounting an hostPath doesn't restrict it to run on a specific host (even tho you can enforce it with nodeSelectors and so on) which means that if the Pod starts on a different node, it may behave differently, not finding the directory and / or file it was expecting.
But even if you restrict the application to run on a specific node, you need to be ok with the idea that, if such node becomes unavailable, the Pod will not be scheduled on its own somewhere else.. meaning you'll need manual intervention to recover from a single node failure (unless the application is multi-instance and can resist one instance going down)
To conclude:
if you want to mount a path on a particular host, for whatever reason, I would go for local volumes.. or at least use hostPath and restrict the Pod to run on the specific node it needs to run on.
if you want to mount small, textual files, you could consider mounting them from ConfigMaps
if you want to configure an application, providing a set of files at a certain path when the app starts, you could go for an init container which prepares files for the main container in an emptyDir volume

K8s - an NFS share is only mounted read-only in the pod

Environment: external NFS share for persistent storage, accessible to all, R/W, Centos7 VMs (NFS share and K8s cluster), NFS utils installed on all workers.
Mount on a VM, e.g. a K8s worker node, works correctly, the share is R/W
Deployed in the K8s cluster: PV, PVC, Deployment (Volumes - referenced to PVC, VolumeMount)
The structure of the YAML files corresponds to the various instructions and postings, including the postings here on the site.
The pod starts, the share is mounted. Unfortunately, it is read-only. All the suggestions from the postings I have found about this did not work so far.
Any idea what else I could look out for, what else I could try?
Thanks. Thomas
After digging deep, I found the cause of the problem. Apparently, the syntax for the NFS export is very sensitive. One more space can be problematic.
On the NFS server, two export entries were stored in the kernel tables. The first R/O and the second R/W. I don't know whether this is a Centos bug because of the syntax in /etc/exports.
On another Centos machine I was able to mount the share without any problems (r/w). In the container (Debian-based image), however, not (only r/o). I have not investigated whether this is due to Kubernetes or Debian behaves differently.
After correcting the /etc/exports file and restarting the NFS server, there was only one correct entry in the kernel table. After that, mounting R/W worked on a Centos machine as well as in the Debian-based container inside K8s.
Here are the files / table:
privious /etc/exports:
/nfsshare 172.16.6.* (rw,sync,no_root_squash,no_all_squash,no_acl)
==> kernel:
/nfsshare 172.16.6.*(ro, ...
/nfsshare *(rw, ...
corrected /etc/exports (w/o the blank):
/nfsshare *(rw,sync,no_root_squash,no_all_squash,no_acl)
In principle, the idea of using an init container is a good one. Thank you for reminding me of this.
I have tried it.
Unfortunately, it doesn't change the basic problem. The file system is mounted "read-only" by Kubernetes. The init container returns the following error message (from the log):
chmod: /var/opt/dilax/enumeris: Read-only file system

Graylog in Kubernetes (1.20) cluster

I try to set up graylog.
This deploy works. But i need to add a volume to graylog deploy. Because I want to install plugins.
When i add volume (hostPath) and start pod, I get an error in my pod:
ERROR StatusLogger File not found in file system or classpath: /usr/share/graylog/data/config/log4j2.xml
ERROR StatusLogger Reconfiguration failed: No configuration found for '70dea4e' at 'null' in 'null'
06:37:19.707 [main] ERROR org.graylog2.bootstrap.CmdLineTool - Couldn't load configuration: Properties file /usr/share/graylog/data/config/graylog.conf doesn't exist!
I see, that pod create catalogues (owner id 1100:1100) in volume, but there is no any file there.
Kubernetes version is 1.20
Runtime in my kubernetes cluster is "containerd".
My Graylog dеploy:
Volume ounys for container:
volumeMounts:
- mountPath: /usr/share/graylog/data
name: graylog-data
Volume:
volumes:
- name: graylog-data
hostPath:
path: /mnt/k8s-storage/graylog-data
type: DirectoryOrCreate
There are few things to look at here starting from the concept of the hostPath volume:
A hostPath volume mounts a file or directory from the host node's
filesystem into your Pod.
Pods with identical configuration (such as created from a PodTemplate)
may behave differently on different nodes due to different files on
the nodes
The files or directories created on the underlying hosts are only
writable by root. You either need to run your process as root in a
privileged Container or modify the file permissions on the host to be
able to write to a hostPath volume
The hostPath would be good if for example you would like to use it for log collector running in a DaemonSet but in your described use case it might not be ideal because you don't directly control which node your pods will run on, so you're not guaranteed that the pod will actually be scheduled on the node that has the data volume.
But if that not the case, you also need to notice that type: DirectoryOrCreate is not best here as I see that you want to load a file. It would be better to use either:
File: A file must exist at the given path
or
FileOrCreate: If nothing exists at the given path, an empty file will be created there as needed with permission set to 0644, having the same group and ownership with Kubelet.
Lastly, there might be a permissions problem. As already stated:
The files or directories created on the underlying hosts are only
writable by root.
Graylog is running with the userid 1100 which might cause a permission denial. Also, I have found a similar issue that might be helpful for you.

Kubernetes Pod - Sync pod directory with a local directory

I have a python pod running.
This python pod is using different shared libraries. To make it easier to debug the shared libraries I would like to have the libraries directory on my host too.
The python dependencies are located in /usr/local/lib/python3.8/site-packages/ and I would like to access this directory on my host to modify some files.
Is that possible and if so, how? I have tried with emptyDir and PV but they always override what already exists in the directory.
Thank you!
This is by design. Kubelet is responsible for preparing the mounts for your container. At the time of mounting they are empty and kubelet has no reason to put any content in them.
That said, there are ways to achieve what you seem to expect by using init container. In your pod you define init container using your docker image, mount your volume in it in some path (ie. /target) but instead of running regular content of your container, run something like
cp -r /my/dir/* /target/
which will initiate your directory with expected content and exit allowing further startup of the pod
Please take a look: overriding-directory.
Another option is to use subPath. Subpath references files or directories that are controlled by the user, not the system. Take a loot on this example how to mount single file into existing directory:
---
volumeMounts:
- name: "config"
mountPath: "/<existing folder>/<file1>"
subPath: "<file1>"
- name: "config"
mountPath: "/<existing folder>/<file2>"
subPath: "<file2>"
restartPolicy: Always
volumes:
- name: "config"
configMap:
name: "config"
---
Check full example here. See: mountpath, files-in-folder-overriding.
You can also as #DavidMaze said debug your setup in a non-container Python virtual environment if you can, or as a second choice debugging the image in Docker without Kubernetes.
You can take into consideration also below third party tools, that were created especially for Kubernetes app developers keeping in mind this functionality (keep in-sync source and remote files).
Skaffold's Continuous Deployment workflow - it takes care of keeping source and remote files (Pod mounted directory) in sync.
Telepresence`s Volume access feature.

Where to store files in GKE container?

I'm having trouble understanding where to store files in a GKE container? I've seen the following documentation of the filesystem layout:
https://cloud.google.com/kubernetes-engine/docs/concepts/node-images#file_system_layout
But then there are also Dockerfile examples on the web that copy executable files to other paths not listed in the layout, such as /usr or /go. One of these examples is here:
https://github.com/GoogleCloudPlatform/kubernetes-engine-samples/blob/master/hello-app/Dockerfile
Another question is: If I have runtime code that needs to download certain configuration information after the container starts, can I write the configuration file to the same directory as my executable? Or do I have to choose /etc or /tmp.
And finally, the layout documentation states that /home and /var store data for the the lifetime of the boot disk? What does that mean? How does that compare to the lifetime of the pod or the node?
When you want to store something in a container you can either store something ephemeral or permanent
To store ephemeral way just choose a path /tmp, /var, /opt etc (this depends on the container set up as well), once the container is restarted the information you would have is the same at the moment the container was created, for instance your binary files and initial config files.
To store permanent you must have to mount a volume, this is a support for your container where a volume (container path) is linked with a external storage. with this if your container is restarted the volume will be mounted once the container is ready again and you are no gonna lose anything.
In kubernetes this is called Persistent Volumes and you can leverage this even if you are in another cloud provider,
steps to used
Define a path where you would mount the volume in your source code example /myfiles/private
Create a storage class in your GKE https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/ssd-pd
Create a Persistent Volume Claim in your GKE https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/ssd-pd
Relate this storage class with your Kubernetes deployment
Example
link the volume with your container
volumeMounts:
- mountPath: /myfiles/private
name: any-name-you-want
relate the persistent volume with your deployment
volumes:
- name: any-name-you-want
persistentVolumeClaim:
claimName: my-claim-name
This is really up to you. By default most base images will leave /tmp writeable as per normal. But anything written inside the container will be gone if/when the container restarts for any reason. For something like config data, that might be fine, for a database probably less so. To get more stable storage you need to use a Volume. The exact type to use depends on your environment and how long the data should live. An emptyDir volume lives only as long as the pod but can be shared between containers in the same pod. Beyond that you would probably use a PersistentVolumeClaim to dynamically provision a new Google Cloud disk which will last unless the claim is deleted (or forever depending on your Reclaim setting).