I have a simple web app that uses volume/persistent volume claim to serve static data from there. Pods got scheduled only on the first worker node where volume resides.
How to deal with the shared volume between nodes in case I want to scale pods and have them allocated across all the available nodes?
- name: app-nginx
image: nginx:1.20.0
command: ["/usr/sbin/nginx", "-g", "daemon off;"]
volumeMounts:
- name: static-volume
mountPath: "/opt/static"
volumes:
- name: static-volume
persistentVolumeClaim:
claimName: pvc-static
One option is to use NFS and not physically allocate volume on an EC2 instance.
Another way is to duplicate static files for each pod (populate them into proper directory with init container) but that requires extra time and is not feasible for a lot a static data.
What's the proper way to deal with such a case within Kubernetes? Is there a way to declare deployment which will be using logically same volume but physically different instances located on the different nodes?
What you are looking for is a volume provider that supports the ReadOnlyMany
or ReadWriteMany Access Mode.
Follow the documentation link to get a list of the officially supported ones.
If you are on AWS than probably using EFS through the NFS plugin will be the easiest solution, but please take into account the fact it is an NFS-based solution and you might hit a performance penalty.
As a side note, what you are trying to do smeels like an anti-pattern to me.
Docker images for an application should be self contained. In your case, having a container serving a static website should contain all the static files it needs to in order to be fully portable. This would remove the need to have an external volume with the data completely.
Yes, you are right one option is to use the NFS. You have to implement the ReadWriteMany or ReadOnlyMany : https://stackoverflow.com/a/57798369/5525824
If you have a scenario of ReadOnlyMany you can create the PVC in GCP with GKE. https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/readonlymany-disks
However, if you are looking forward to doing a write operation also the available option is to use FileSystem or NFS.
You can also checkout implementing the Minio if not want to use any managed service and following cloud-agnostic : https://min.io/
NAS : https://github.com/minio/charts#nas-gateway
Just for FYI : https://github.com/ctrox/csi-s3 performance might be not good.
Related
I set up a local Kubernetes cluster using Kind, and then I run Apache-Airflow on it using Helm.
To actually create the pods and run Airflow, I use the command:
helm upgrade -f k8s/values.yaml airflow bitnami/airflow
which uses the chart airflow from the bitnami/airflow repo, and "feeds" it with the configuration of values.yaml.
The file values.yaml looks something like:
web:
extraVolumeMounts:
- name: functions
mountPath: /dir/functions/
extraVolumes:
- name: functions
hostPath:
path: /dir/functions/
type: Directory
where web is one component of Airflow (and one of the pods on my setup), and the directory /dir/functions/ is successfully mapped from the cluster inside the pod. However, I fail to do the same for a single, specific file, instead of a whole directory.
Does anyone knows the syntax for that? Or have an idea for an alternative way to map the file into the pod (its whole directory is successfully mapped into the cluster)?
There is a File type for hostPath which should behave like you desire, as it states in the docs:
File: A file must exist at the given path
which you can then use with the precise file path in mountPath. Example:
web:
extraVolumeMounts:
- name: singlefile
mountPath: /path/to/mount/the/file.txt
extraVolumes:
- name: singlefile
hostPath:
path: /path/on/the/host/to/the/file.txt
type: File
Or if it's not a problem, you could mount the whole directory containing it at the expected path.
With this said, I want to point out that using hostPath is (almost always) never a good idea.
If you have a cluster with more than one node, saying that your Pod is mounting an hostPath doesn't restrict it to run on a specific host (even tho you can enforce it with nodeSelectors and so on) which means that if the Pod starts on a different node, it may behave differently, not finding the directory and / or file it was expecting.
But even if you restrict the application to run on a specific node, you need to be ok with the idea that, if such node becomes unavailable, the Pod will not be scheduled on its own somewhere else.. meaning you'll need manual intervention to recover from a single node failure (unless the application is multi-instance and can resist one instance going down)
To conclude:
if you want to mount a path on a particular host, for whatever reason, I would go for local volumes.. or at least use hostPath and restrict the Pod to run on the specific node it needs to run on.
if you want to mount small, textual files, you could consider mounting them from ConfigMaps
if you want to configure an application, providing a set of files at a certain path when the app starts, you could go for an init container which prepares files for the main container in an emptyDir volume
I read an article from here which the data is shared in the same Pod with 2 different containers. These 2 containers both have volumnMount on name, shared-data. But both of them having different mountPath.
My question is, if these mountPath are not same, how are they sharing data? And what is the path for the volume shared-data? My thought is, both should have the same path in order to share data, and i seems like mistaken some concept, but not sure what.
Kubernetes maintains the storage internally. It doesn't have a fixed path that you can see, and it doesn't matter if it gets mounted in the same place in different containers.
By way of analogy, imagine you have an external USB drive. If you've unplugged the drive, it doesn't make sense to ask "what is its path"; and if you plug it in and mount it on /mnt/usb on one machine, that doesn't stop you from mounting it on /home/me/app/data when you plug it into a different machine.
The volume does have a name within its pod (in your example, shared-data). If the volume is backed by a PersistentVolumeClaim that will also have a name. Potentially the matching PersistentVolume is something like an AWS EBS volume, and that will have a name. But none of these names are fixed filesystem paths, and for the most part you can't directly use these to access the file content.
There is only one volume being created "shared-data" which in being declared in pod initially empty :
volumes:- name: shared-data emptyDir: {}
and shared between these two containers .That volume exists on the pod level and it existence only depends on the pod not the two containers .However its bind mounted by the two : meaning whatever you add/edit on the one container or the other , will affect the volume (in your case adding index.html from the debian container).. and yes you can find the path of the volume :/var/lib/kubelet/pods/PODUID/volumes/kubernetes.io~empty-dir/VOLUMENAME .. there is similar question answered here
I am migrating my previous deployment made with docker-compose to Kubernetes.
In my previous deployment, some containers do have some data made at build time in some paths and these paths are mounted in persistent volumes.
Therefore, as the Docker volume documentation states,the persistent volume (not a bind mount) will be pre-populated with the container directory content.
I'd like to achieve this behavior with Kubernetes and its persistent volumes, How can I do ? Do I need to add some kind of logic using scripts in order to copy my container's files to the mounted path when data is not present the first time the container starts ?
Possibly related question: Kubernetes mount volume on existing directory with files inside the container
I think your options are
ConfigMap (are "some data" configuration files?)
Init containers (as mentioned)
CSI Volume Cloning (clone combining an init or your first app container)
there used to be a gitRepo; deprecated in favour of init containers where you can clone your config and data from
HostPath volume mount is an option too
An NFS volume is probably a very reasonable option and similar from an approach point of view to your Docker Volumes
Storage type: NFS, iscsi, awsElasticBlockStore, gcePersistentDisk and others can be pre-populated. There are constraints. NFS probably the most flexible for sharing bits & bytes.
FYI
The subPath might be of interest too depending on your use case and
PodPreset might help in streamlining the op across the fleet of your pods
HTH
I'm new to Kubernetes and learning it.
have deployment kind of pods and replcas=3.
Is there any way I can mount separate volume for each pod and one volume for all pods.
Requirements:
case 1- My application that is generating some temp file name tempfile.txt, So there is three replica pod, each one will generate tempfile.txt but content might be different. So If I use shared volume that will override each other .
case-2: I have a common file that is not part of image, that will be used by all pods starting the application i.e copy files from host to all pods's container
Thanks in Advance.
There are multiple ways to achieve the first part. Here is mine:
Instead of a deployment, use a statefulSet to create the replicas. statefulSets allow you to include a volume template which each pod have created with it, thus each new pod will have a new PV created specificlaly for it.
This does require your cluster to allow for dynamically provisioned volumes.
Depending on the size of your tempfile.txt, your use case, and your cluster/node configuration, you might also want to consider using a hostPath volume which will use the local storage of your node.
For the second part of your question, using any readWriteMany volume will work (such as any NFS option).
On the note of subPath, this should also work, so long as you define different subPaths for each pod. the example in the link provided by DT does this by creating a subpath based off the pod name.
I'm having trouble understanding where to store files in a GKE container? I've seen the following documentation of the filesystem layout:
https://cloud.google.com/kubernetes-engine/docs/concepts/node-images#file_system_layout
But then there are also Dockerfile examples on the web that copy executable files to other paths not listed in the layout, such as /usr or /go. One of these examples is here:
https://github.com/GoogleCloudPlatform/kubernetes-engine-samples/blob/master/hello-app/Dockerfile
Another question is: If I have runtime code that needs to download certain configuration information after the container starts, can I write the configuration file to the same directory as my executable? Or do I have to choose /etc or /tmp.
And finally, the layout documentation states that /home and /var store data for the the lifetime of the boot disk? What does that mean? How does that compare to the lifetime of the pod or the node?
When you want to store something in a container you can either store something ephemeral or permanent
To store ephemeral way just choose a path /tmp, /var, /opt etc (this depends on the container set up as well), once the container is restarted the information you would have is the same at the moment the container was created, for instance your binary files and initial config files.
To store permanent you must have to mount a volume, this is a support for your container where a volume (container path) is linked with a external storage. with this if your container is restarted the volume will be mounted once the container is ready again and you are no gonna lose anything.
In kubernetes this is called Persistent Volumes and you can leverage this even if you are in another cloud provider,
steps to used
Define a path where you would mount the volume in your source code example /myfiles/private
Create a storage class in your GKE https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/ssd-pd
Create a Persistent Volume Claim in your GKE https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/ssd-pd
Relate this storage class with your Kubernetes deployment
Example
link the volume with your container
volumeMounts:
- mountPath: /myfiles/private
name: any-name-you-want
relate the persistent volume with your deployment
volumes:
- name: any-name-you-want
persistentVolumeClaim:
claimName: my-claim-name
This is really up to you. By default most base images will leave /tmp writeable as per normal. But anything written inside the container will be gone if/when the container restarts for any reason. For something like config data, that might be fine, for a database probably less so. To get more stable storage you need to use a Volume. The exact type to use depends on your environment and how long the data should live. An emptyDir volume lives only as long as the pod but can be shared between containers in the same pod. Beyond that you would probably use a PersistentVolumeClaim to dynamically provision a new Google Cloud disk which will last unless the claim is deleted (or forever depending on your Reclaim setting).