Kubernetes Pod - Sync pod directory with a local directory - kubernetes

I have a python pod running.
This python pod is using different shared libraries. To make it easier to debug the shared libraries I would like to have the libraries directory on my host too.
The python dependencies are located in /usr/local/lib/python3.8/site-packages/ and I would like to access this directory on my host to modify some files.
Is that possible and if so, how? I have tried with emptyDir and PV but they always override what already exists in the directory.
Thank you!

This is by design. Kubelet is responsible for preparing the mounts for your container. At the time of mounting they are empty and kubelet has no reason to put any content in them.
That said, there are ways to achieve what you seem to expect by using init container. In your pod you define init container using your docker image, mount your volume in it in some path (ie. /target) but instead of running regular content of your container, run something like
cp -r /my/dir/* /target/
which will initiate your directory with expected content and exit allowing further startup of the pod
Please take a look: overriding-directory.
Another option is to use subPath. Subpath references files or directories that are controlled by the user, not the system. Take a loot on this example how to mount single file into existing directory:
---
volumeMounts:
- name: "config"
mountPath: "/<existing folder>/<file1>"
subPath: "<file1>"
- name: "config"
mountPath: "/<existing folder>/<file2>"
subPath: "<file2>"
restartPolicy: Always
volumes:
- name: "config"
configMap:
name: "config"
---
Check full example here. See: mountpath, files-in-folder-overriding.
You can also as #DavidMaze said debug your setup in a non-container Python virtual environment if you can, or as a second choice debugging the image in Docker without Kubernetes.

You can take into consideration also below third party tools, that were created especially for Kubernetes app developers keeping in mind this functionality (keep in-sync source and remote files).
Skaffold's Continuous Deployment workflow - it takes care of keeping source and remote files (Pod mounted directory) in sync.
Telepresence`s Volume access feature.

Related

How to mount a single file from the local Kubernetes cluster into the pods

I set up a local Kubernetes cluster using Kind, and then I run Apache-Airflow on it using Helm.
To actually create the pods and run Airflow, I use the command:
helm upgrade -f k8s/values.yaml airflow bitnami/airflow
which uses the chart airflow from the bitnami/airflow repo, and "feeds" it with the configuration of values.yaml.
The file values.yaml looks something like:
web:
extraVolumeMounts:
- name: functions
mountPath: /dir/functions/
extraVolumes:
- name: functions
hostPath:
path: /dir/functions/
type: Directory
where web is one component of Airflow (and one of the pods on my setup), and the directory /dir/functions/ is successfully mapped from the cluster inside the pod. However, I fail to do the same for a single, specific file, instead of a whole directory.
Does anyone knows the syntax for that? Or have an idea for an alternative way to map the file into the pod (its whole directory is successfully mapped into the cluster)?
There is a File type for hostPath which should behave like you desire, as it states in the docs:
File: A file must exist at the given path
which you can then use with the precise file path in mountPath. Example:
web:
extraVolumeMounts:
- name: singlefile
mountPath: /path/to/mount/the/file.txt
extraVolumes:
- name: singlefile
hostPath:
path: /path/on/the/host/to/the/file.txt
type: File
Or if it's not a problem, you could mount the whole directory containing it at the expected path.
With this said, I want to point out that using hostPath is (almost always) never a good idea.
If you have a cluster with more than one node, saying that your Pod is mounting an hostPath doesn't restrict it to run on a specific host (even tho you can enforce it with nodeSelectors and so on) which means that if the Pod starts on a different node, it may behave differently, not finding the directory and / or file it was expecting.
But even if you restrict the application to run on a specific node, you need to be ok with the idea that, if such node becomes unavailable, the Pod will not be scheduled on its own somewhere else.. meaning you'll need manual intervention to recover from a single node failure (unless the application is multi-instance and can resist one instance going down)
To conclude:
if you want to mount a path on a particular host, for whatever reason, I would go for local volumes.. or at least use hostPath and restrict the Pod to run on the specific node it needs to run on.
if you want to mount small, textual files, you could consider mounting them from ConfigMaps
if you want to configure an application, providing a set of files at a certain path when the app starts, you could go for an init container which prepares files for the main container in an emptyDir volume

Kubernetes volume duplication between worker nodes

I have a simple web app that uses volume/persistent volume claim to serve static data from there. Pods got scheduled only on the first worker node where volume resides.
How to deal with the shared volume between nodes in case I want to scale pods and have them allocated across all the available nodes?
- name: app-nginx
image: nginx:1.20.0
command: ["/usr/sbin/nginx", "-g", "daemon off;"]
volumeMounts:
- name: static-volume
mountPath: "/opt/static"
volumes:
- name: static-volume
persistentVolumeClaim:
claimName: pvc-static
One option is to use NFS and not physically allocate volume on an EC2 instance.
Another way is to duplicate static files for each pod (populate them into proper directory with init container) but that requires extra time and is not feasible for a lot a static data.
What's the proper way to deal with such a case within Kubernetes? Is there a way to declare deployment which will be using logically same volume but physically different instances located on the different nodes?
What you are looking for is a volume provider that supports the ReadOnlyMany
or ReadWriteMany Access Mode.
Follow the documentation link to get a list of the officially supported ones.
If you are on AWS than probably using EFS through the NFS plugin will be the easiest solution, but please take into account the fact it is an NFS-based solution and you might hit a performance penalty.
As a side note, what you are trying to do smeels like an anti-pattern to me.
Docker images for an application should be self contained. In your case, having a container serving a static website should contain all the static files it needs to in order to be fully portable. This would remove the need to have an external volume with the data completely.
Yes, you are right one option is to use the NFS. You have to implement the ReadWriteMany or ReadOnlyMany : https://stackoverflow.com/a/57798369/5525824
If you have a scenario of ReadOnlyMany you can create the PVC in GCP with GKE. https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/readonlymany-disks
However, if you are looking forward to doing a write operation also the available option is to use FileSystem or NFS.
You can also checkout implementing the Minio if not want to use any managed service and following cloud-agnostic : https://min.io/
NAS : https://github.com/minio/charts#nas-gateway
Just for FYI : https://github.com/ctrox/csi-s3 performance might be not good.

from docker-compose to kubernetes as development environment using minikube

Right now i use docker-compose for development. This is a great tool that comes handy if i use it on simple projects where i got maximum of 3-6 active services but when it comes to 6-8 and more it is become hard to manage.
So i've started to learn k8s on minikube and now i got few questions about some questions:
How to make "two-way" binding for volumes? for example if i have folder named "my-frontend" and i want to sync specific folder in deployment, how to "link" them using PV and PVC ?
Very often it comes handy to make some service with specific environment like node:12.0.0 and then use it as command executor like this: docker-compose run workspace npm install
how to achieve something like this using k8s?
How to make "two-way" binding for volumes? for example if i have folder named "my-frontend" and i want to sync specific folder in deployment, how to "link" them using PV and PVC ?
You need to create a PersistentVolume which in your case will use a specific directory in the host, in Kubernetes official documentation there's an example with this same use case.
Then a PersistentVolumeClaim to request some space from this volume (also an example in the previous documentation) and then mount the PVC on the pod/deployment where you need it.
volumes:
- name: my-volume
persistentVolumeClaim:
claimName: my-pvc
containers:
....
volumeMounts:
- mountPath: "/mount/path/in/pod"
name: my-volume
Very often it comes handy to make some service with specific environment like node:12.0.0 and then use it as command executor like this: docker-compose run workspace npm install
how to achieve something like this using k8s?
You need to use kubectl, it has very similar functionalities as docker CLI, it also supports run command with very similar parameters and functionality. Alternatively, you can start your pod once and then run commands multiple times in the same instance by using kubectl exec

Graylog in Kubernetes (1.20) cluster

I try to set up graylog.
This deploy works. But i need to add a volume to graylog deploy. Because I want to install plugins.
When i add volume (hostPath) and start pod, I get an error in my pod:
ERROR StatusLogger File not found in file system or classpath: /usr/share/graylog/data/config/log4j2.xml
ERROR StatusLogger Reconfiguration failed: No configuration found for '70dea4e' at 'null' in 'null'
06:37:19.707 [main] ERROR org.graylog2.bootstrap.CmdLineTool - Couldn't load configuration: Properties file /usr/share/graylog/data/config/graylog.conf doesn't exist!
I see, that pod create catalogues (owner id 1100:1100) in volume, but there is no any file there.
Kubernetes version is 1.20
Runtime in my kubernetes cluster is "containerd".
My Graylog dеploy:
Volume ounys for container:
volumeMounts:
- mountPath: /usr/share/graylog/data
name: graylog-data
Volume:
volumes:
- name: graylog-data
hostPath:
path: /mnt/k8s-storage/graylog-data
type: DirectoryOrCreate
There are few things to look at here starting from the concept of the hostPath volume:
A hostPath volume mounts a file or directory from the host node's
filesystem into your Pod.
Pods with identical configuration (such as created from a PodTemplate)
may behave differently on different nodes due to different files on
the nodes
The files or directories created on the underlying hosts are only
writable by root. You either need to run your process as root in a
privileged Container or modify the file permissions on the host to be
able to write to a hostPath volume
The hostPath would be good if for example you would like to use it for log collector running in a DaemonSet but in your described use case it might not be ideal because you don't directly control which node your pods will run on, so you're not guaranteed that the pod will actually be scheduled on the node that has the data volume.
But if that not the case, you also need to notice that type: DirectoryOrCreate is not best here as I see that you want to load a file. It would be better to use either:
File: A file must exist at the given path
or
FileOrCreate: If nothing exists at the given path, an empty file will be created there as needed with permission set to 0644, having the same group and ownership with Kubelet.
Lastly, there might be a permissions problem. As already stated:
The files or directories created on the underlying hosts are only
writable by root.
Graylog is running with the userid 1100 which might cause a permission denial. Also, I have found a similar issue that might be helpful for you.

Where to store files in GKE container?

I'm having trouble understanding where to store files in a GKE container? I've seen the following documentation of the filesystem layout:
https://cloud.google.com/kubernetes-engine/docs/concepts/node-images#file_system_layout
But then there are also Dockerfile examples on the web that copy executable files to other paths not listed in the layout, such as /usr or /go. One of these examples is here:
https://github.com/GoogleCloudPlatform/kubernetes-engine-samples/blob/master/hello-app/Dockerfile
Another question is: If I have runtime code that needs to download certain configuration information after the container starts, can I write the configuration file to the same directory as my executable? Or do I have to choose /etc or /tmp.
And finally, the layout documentation states that /home and /var store data for the the lifetime of the boot disk? What does that mean? How does that compare to the lifetime of the pod or the node?
When you want to store something in a container you can either store something ephemeral or permanent
To store ephemeral way just choose a path /tmp, /var, /opt etc (this depends on the container set up as well), once the container is restarted the information you would have is the same at the moment the container was created, for instance your binary files and initial config files.
To store permanent you must have to mount a volume, this is a support for your container where a volume (container path) is linked with a external storage. with this if your container is restarted the volume will be mounted once the container is ready again and you are no gonna lose anything.
In kubernetes this is called Persistent Volumes and you can leverage this even if you are in another cloud provider,
steps to used
Define a path where you would mount the volume in your source code example /myfiles/private
Create a storage class in your GKE https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/ssd-pd
Create a Persistent Volume Claim in your GKE https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/ssd-pd
Relate this storage class with your Kubernetes deployment
Example
link the volume with your container
volumeMounts:
- mountPath: /myfiles/private
name: any-name-you-want
relate the persistent volume with your deployment
volumes:
- name: any-name-you-want
persistentVolumeClaim:
claimName: my-claim-name
This is really up to you. By default most base images will leave /tmp writeable as per normal. But anything written inside the container will be gone if/when the container restarts for any reason. For something like config data, that might be fine, for a database probably less so. To get more stable storage you need to use a Volume. The exact type to use depends on your environment and how long the data should live. An emptyDir volume lives only as long as the pod but can be shared between containers in the same pod. Beyond that you would probably use a PersistentVolumeClaim to dynamically provision a new Google Cloud disk which will last unless the claim is deleted (or forever depending on your Reclaim setting).