Mapping local directory to kubernetes - kubernetes

I am using Docker desktop to run a application in kubernetes platform where i need location to store files how can i use my local directory(c:\app-data) to be pointed to application running in kubernetes.

I had a similar problem. Docker contains are usually meant to be throwaway/gateway containers normally, so people don't usually use them for storing files.
That being said, you have two options:
Add path and files to docker container, which will cause your docker container to be massive in size (NOT RECOMMENDED). Docker build will require substantial time and memory, as all the files will be copied. Here's an example of creating a local ubuntu container with docker. https://thenewstack.io/docker-basics-how-to-share-data-between-a-docker-container-and-host/
Host your files through another server/api, and fetch those files using simple requests in your app. I used this solution. The only caveat is you need
to be able to host your files somehow. This is easy enough, but may require extra payment. https://www.techradar.com/best/file-hosting-and-sharing-services

You can't really do this. The right approach depends on what the data you're trying to store is.
If you're just trying to store data somewhere – perhaps it's the backing data for a MySQL StatefulSet – you can create a PersistentVolumeClaim like normal. Minikube includes a minimal volume provisioner so you should automatically get a PersistentVolume created; you don't need to do any special setup for this. But, the PersistentVolume will live within the minikube container/VM; if you completely delete the minikube setup, it could delete that data, and you won't be able to directly access the data from the host.
If you have a data set on the host that your container needs to access, there are a couple of ways to do it. Keep in mind that, in a "real" Kubernetes cluster, you won't be able to access your local filesystem at all. Creating a PersistentVolume as above and then running a pod to copy the data into it could be one approach; as #ParmandeepChaddha suggests in their answer, baking the data into the image is another reasonable approach (this can be very reasonable if the data is only a couple of megabytes).
If the data is the input or output data to your process, you can also consider restructuring your application so that it transfers that data over a protocol like HTTP. Set up a NodePort Service in front of your application, and use a tool like curl to HTTP POST the data into the service.
Finally, you could be considering a setup where all of the important data is local: you have some batch files on the local system, the job's purpose is to convert some local files to other local files, and it's just that the program is in minikube. (Or, similarly, you're trying to develop your application and the source files are on your local system.) In this case Kubernetes, as a distributed, clustered container system, isn't the right tool. Running the application directly on your system is the best approach; you can simulate this with a docker run -v bind mount, but this is inconvenient and can lead to permission and environment problems.
(In theory you can use a hostPath volume too, and minikube has some support to mount a host directory into the VM. In practice, the setup required to do this is as complex as the rest of your Kubernetes setup combined, and it won't be portable to any other Kubernetes installation. I wouldn't attempt this.)

You can mount your local directory to your kubernetes Pod using hostPath. Your path c:\app-data on your Windows host should be represented as either /C/app-data or /host_mnt/c/app-data, depending on your Docker Desktop version as suggested in this comment.
You may also want to take a look at this answer.

Related

Do Kubernetes pods replicas share read-only file system from the underlying image?

Let's say I deployed 2 pods to Kubernetes and they both have the same underlying image which includes some read-only file system.
By default, do the pods share this file system? Or each pod copies the file system and hence has a separate copy of it?
I would appreciate any answer and especially would love to see some documentation or resources I can read in order to delve deeper into this issue.
Thanks in advance!
In short, it depends on where the pods are running. If they are running on the same node, then yes, they share the same read-only copy of the image, and if on separate nodes, then they have their own read-only copy of the image. Keep reading if you are interesting in knowing more technical details of this.
Inside Kubernetes Pods
A pod can be viewed as a set of containers bound together. It is a construct provided by Kubernetes to be able to have certain benefits out of the box. We can understand your question better if we zoom into a single node that is part of a Kubernetes cluster.
This node will have a kubelet binary running on it, which will receive certain "instructions" from the api-server on running pods. These "instructions" will be passed onto the cri (Container Runtime Interface) running on your node (let's assume it is the docker-engine). And this cri will be responsible for actually running the needed containers and report back to the kubelet which will report back to the api-server ultimately reporting to the pod-controller that the pod containers are Running.
Now, the question becomes, do multiple pods share the same image? I said the answer is yes for pods on the same node and this is how it works.
Say you run the first pod, the docker daemon running on your k8s node pulls this image from the configured registry and stores it in the local cache of the node. It then starts a container using this image. Note that a container that runs, utilizes the image as simply a read-only file-system, and depending on the storage driver configured in docker, you can have a "writeable layer" on top of this read-only filesystem that is used to allow you to read/write on the file-system of your container. This writeable layer is temporary and vanishes when you delete the container.
When you run the second pod, the daemon finds that the image is already available locally, and simply creates the small writeable layer for your container, on top of an existing image from the cache and provides this as a "writeable file system" to your container. This speeds things up.
Now, in case of docker, these read-only layers of the image (as one 'file-system') are shared across all containers running on the host. This makes sense since there is no need to copy a read-only file system and sharing it with multiple containers is safe. And each container can maintain its uniqueness by storing its data in the thin writeable layer that it has.
References
For further reading, you can use:
Read about storage drivers in docker. It explains how multiple containers share the r/o layer of the image.
Read details about different storage driver types to see how this "thin writeable layer" is implemented in practice by docker.
Read about container runtimes in Kubernetes to understand that docker isn't the only supported runtime. There are others but more or less, the same will hold true for them as well, as it makes sense to cache images locally and re-use the read-only image file system for multiple containers.
Read more about the kubelet component of Kubernetes to understand how it can support multiple run-times and how it helps the pod-controller setup and manage containers.
And of course, finally you can find more details about pods here. This will make a lot more sense after you've read the material above.
Hope this helps!

Sharing files between pods

I have a service that generates a picture. Once it's ready, the user will be able to download it.
What is the recommended way to share a storage volume between a worker pod and a backend service?
In general the recommended way is "don't". While a few volume providers support multi-mounting, it's very hard to do that in a way that isn't sadmaking. Preferably use an external services like AWS S3 for hosting the actual file content and store references in your existing database(s). If you need a local equivalent, check out Minio for simple cases.
Personally i will not recommended it to do. better then that you two container side one pod if having dependency on each other. so if one pod goes fail that file manager also delete and create at particular time if needed

access the same file with several mongo db

Problem
I have a project to do, it consists on the following: I need to have a containerized Database. When the load to the database goes up and the database gets overloaded, I need to pull up another container(pod) with the database. The problem is that the new pod needs to have some data preloaded (for reading purposes to the users) and when the load goes down, the new pod will get terminated and the data stored in there needs to be stored in a central database (to avoid loosing it).
I'm using a Kubernetes cluster (Google Kubernetes Engine) for the pods and Mongo db. It kind of looks like this
DB Diagram
I know that the problem described above is probably not the recommended approach, but that's what they are asking us to do.
Now, the problem is that MongoDB does not allow to do that (merge content from several databases into one database). A script that is controlling the pods (that need to be handled dinamically) and pulling the data from them and pushing it to the central database is complicated and things like having control of the data that was already pulled need to be taking care of.
My Idea of Solution
So, my idea was that all the containers point to the same volume. That means that the files in the directory '/data/db' (where mongo stores it files) of every pod are the same for all the pods because the same volume is mounted for all the pods. It kind of looks like this
Same Volume Mounted for each Pod
Kubernetes allows you to use several volumes. The ones that allow ReadWriteMany are NFS and Cephfs among others. I tried the example of this NFS link but it did not worked. The first pod started successfully but the others got stucked in "Starting Container". I assume that the volume could not be mounted for the other pods because the WriteMany was not allowed.
I tried creating a Cephfs cluster manually, but I am having some trouble to use the cluster from the Kubernetes Cluster. My Question is: will cephfs do the job? Can cephfs handle several writers in the same file? If it can, will the Mongo pods go up successfully? NFS was supposed to work, but it didn't.

How to get files into pod?

I have a fully functioning Kubernetes cluster with one master and one worker, running on CoreOS.
Everything is working and my pods and services are running fine. Now I have no clue how to proceed in a webserver idea.
Before I go further: I have no configs at the moment about my idea I am going to explain. I just did a lot of research.
When setting up a pod (nginx) with a service. You get the default nginx page. After that you can setup a mount volume with a hostvolume (volume mapping from host to container).
But lets say I want to seperate every site (multiple sites separated with different pods), how can I let my users add files to their pod/nginx document root? Having FTP in the CoreOS node removes the Kubernetes way and adds security vulnerabilities.
If someone can help me shed some light on this issue, that would be great.
Thanks for your time.
I'm assuming that you want to have multiple nginx servers running. The content of each nginx server is managed by a different admin (you called them users).
TL;DR:
Option 1: Each admin needs to build their own nginx docker image every time the static files change and deploy that new image. This is if you consider these static files as a part of the source-code of the nginx application
Option 2: Use a persistent volume for nginx, the init-script for the nginx image should use something like s3 to sync all its files with s3 and then start nginx
Before you proceed with building an application with kubernetes. The most important thing is to separate your services into 2 conceptual categories, and give up your desire to touch the underlying nodes directly:
1) Stateless: These are services that are built by the developers and can be released. They can be stopped, started, moved from one node to another, their filesystem can be reset during restart and they will work perfectly fine. Majority of your web-services will fit this category.
2) Stateful: These services cannot be stopped and restarted willy nilly like the ones above. Primarily, their underlying filesystem must be persistent and remain the same across runs of the service. Databases, file-servers and similar services are in this category. These need special care and should use k8s persistent-volumes and now stateful-sets.
Typical application:
nginx: build the nginx.conf into the docker image, and deploy it as a stateless service
rails/nodejs/python service: build the source code into the docker image, configure with env-vars, deploy as a stateless service
database: mount a persistent volume, configure with env-vars, deploy as a stateful service.
Separate sites:
Typically, I think at a k8s deployment and a k8s service level. Each site can be one k8s deployment and k8s service set. You can then have separate ways to expose them (different external DNS/IPs)
Application users storing files:
This is firmly in the category of a stateful service. Use a persistent volume to mount to a /media kind of directory
Developers changing files:
Say developers or admins want to use FTP to change the files that nginx serves. The correct pattern is to build a docker image with the new files and then use that docker image. If there are too many files, and you don't consider those files to be a part of the 'source' of the nginx, then use something like s3 and a persistent volume. In your docker image init script, don't directly start nginx. Contact s3, sync all your files onto your persistent volume, then start nginx.
While the options and reasoning listed by iamnat are right, there's at least one more option to add to the list. You could consider using ConfigMap objects, maintain your file within the configmap and mount them to your containers.
A good example can be found in the official documentation - check the Real World Example configuring Redis section to get some actionable input.

How can I use Google Cloud Storage in a container deployed to the Google Container Engine?

Background
I have a Java-Servlet application that runs in tomcat, which runs in a docker container, which runs on the Google Container Engine. It is no big deal to extend the docker image so that it also fetches and refreshes the certificates (there is only a single pod per domain, so no inter-pod-communication is required). However certbot needs to save it's credentials and certificates somewhere and the pod's filesystem seems like a bad idea because it is ephemeral and won't survive a pod restart. According to the table with storage options. Google Cloud storage seems like a good idea, because it is very cheap, the volume is auto sized and I can also access it from multiple locations (I don't need to create one disk for each individual pod which will be pretty much empty) including the web-UI (the later may be useful for debugging) and throuput and latency are really no issue for this usecase.
Question
I created a bucket and now I want to access that bucket from a container. Google describes here and yet again here that I can mount the buckets using FUSE. What they don't mention is that you need to make the container privileged to use FUSE which does not feel quite right for me. Additionally I need to install the whole google cloud SDK and set up authentication (which I am going to store... where?). But actually I don't really need fuse access. Just downloading the config on startup and uploading the config after each refresh would be enough. So something that works similar to SCP would do...
There is gcloud which can access files from command line without the need for FUSE, but it still needs to be initialized somehow with credentials.
Here user326502 mentions
It won't work with zero configuration if the App Engine SDK is installed [..] As long as the container lives on a Google Compute Engine instance you can access any bucket in the same project.
He explains further that I magically don't need any credentials when I just use the library. I guess I could write my own copy application with those libraries, but it feels like the fact that I did not find something like this from anyone on the net makes me feel that I am completely on the wrong track.
So how would one actually access a google cloud storage bucket from within a container (as simple as possible)?
You can use gsutil to copy from the bucket to the local disk when the container starts up.
If you are running in Google Container Engine, gsutil will use the service account of the cluster's nodes (to do this, you'll need to specify the storage-ro scope when you create your cluster).
Alternatively, you can create a new service account, generating a JSON key. In Container Engine, you can store that key as a Kubernetes secret, and then mount the secret in the pod that needs to use it. From that pod, you'd configure gsutil to use the service account by calling gcloud auth activate-service-account--key-file /path/to/my/mounted/secret-key.json