How can I use Google Cloud Storage in a container deployed to the Google Container Engine? - google-cloud-storage

Background
I have a Java-Servlet application that runs in tomcat, which runs in a docker container, which runs on the Google Container Engine. It is no big deal to extend the docker image so that it also fetches and refreshes the certificates (there is only a single pod per domain, so no inter-pod-communication is required). However certbot needs to save it's credentials and certificates somewhere and the pod's filesystem seems like a bad idea because it is ephemeral and won't survive a pod restart. According to the table with storage options. Google Cloud storage seems like a good idea, because it is very cheap, the volume is auto sized and I can also access it from multiple locations (I don't need to create one disk for each individual pod which will be pretty much empty) including the web-UI (the later may be useful for debugging) and throuput and latency are really no issue for this usecase.
Question
I created a bucket and now I want to access that bucket from a container. Google describes here and yet again here that I can mount the buckets using FUSE. What they don't mention is that you need to make the container privileged to use FUSE which does not feel quite right for me. Additionally I need to install the whole google cloud SDK and set up authentication (which I am going to store... where?). But actually I don't really need fuse access. Just downloading the config on startup and uploading the config after each refresh would be enough. So something that works similar to SCP would do...
There is gcloud which can access files from command line without the need for FUSE, but it still needs to be initialized somehow with credentials.
Here user326502 mentions
It won't work with zero configuration if the App Engine SDK is installed [..] As long as the container lives on a Google Compute Engine instance you can access any bucket in the same project.
He explains further that I magically don't need any credentials when I just use the library. I guess I could write my own copy application with those libraries, but it feels like the fact that I did not find something like this from anyone on the net makes me feel that I am completely on the wrong track.
So how would one actually access a google cloud storage bucket from within a container (as simple as possible)?

You can use gsutil to copy from the bucket to the local disk when the container starts up.
If you are running in Google Container Engine, gsutil will use the service account of the cluster's nodes (to do this, you'll need to specify the storage-ro scope when you create your cluster).
Alternatively, you can create a new service account, generating a JSON key. In Container Engine, you can store that key as a Kubernetes secret, and then mount the secret in the pod that needs to use it. From that pod, you'd configure gsutil to use the service account by calling gcloud auth activate-service-account--key-file /path/to/my/mounted/secret-key.json

Related

How can I use GCP NFS Filestore on k8 cluster with TPUs?

I'm using GKE to run K8 workloads and want to add TPU support. From GCP docs, I "need" to attach a GCS bucket so the Job can read models and store logs. However, we already create shared NSF mounts for our k8 clusters. How hard of a requirement is it to "need" GCS to use TPUs? Can shared Filestore NFS mounts work just fine? What about using GCS Fuse?
I'm trying to avoid having the cluster user know about the back end file system (NFS vs GCS), and just know that that the files they provide will be available at "/home/job". Since the linked docs show passing a gs://mybucket/some/path value as needed for file system parameters, I'm not sure if a /home/job value will still work. Does the TPU access the filesystem directly and is only compatible with GCS? Or do the nodes access the filesystem (preferring GCS) and then share the data (in memory) with the TPUs?
I'll try it out to learn the hard way (and report back), but curious if others have experience with this already.

Mapping local directory to kubernetes

I am using Docker desktop to run a application in kubernetes platform where i need location to store files how can i use my local directory(c:\app-data) to be pointed to application running in kubernetes.
I had a similar problem. Docker contains are usually meant to be throwaway/gateway containers normally, so people don't usually use them for storing files.
That being said, you have two options:
Add path and files to docker container, which will cause your docker container to be massive in size (NOT RECOMMENDED). Docker build will require substantial time and memory, as all the files will be copied. Here's an example of creating a local ubuntu container with docker. https://thenewstack.io/docker-basics-how-to-share-data-between-a-docker-container-and-host/
Host your files through another server/api, and fetch those files using simple requests in your app. I used this solution. The only caveat is you need
to be able to host your files somehow. This is easy enough, but may require extra payment. https://www.techradar.com/best/file-hosting-and-sharing-services
You can't really do this. The right approach depends on what the data you're trying to store is.
If you're just trying to store data somewhere – perhaps it's the backing data for a MySQL StatefulSet – you can create a PersistentVolumeClaim like normal. Minikube includes a minimal volume provisioner so you should automatically get a PersistentVolume created; you don't need to do any special setup for this. But, the PersistentVolume will live within the minikube container/VM; if you completely delete the minikube setup, it could delete that data, and you won't be able to directly access the data from the host.
If you have a data set on the host that your container needs to access, there are a couple of ways to do it. Keep in mind that, in a "real" Kubernetes cluster, you won't be able to access your local filesystem at all. Creating a PersistentVolume as above and then running a pod to copy the data into it could be one approach; as #ParmandeepChaddha suggests in their answer, baking the data into the image is another reasonable approach (this can be very reasonable if the data is only a couple of megabytes).
If the data is the input or output data to your process, you can also consider restructuring your application so that it transfers that data over a protocol like HTTP. Set up a NodePort Service in front of your application, and use a tool like curl to HTTP POST the data into the service.
Finally, you could be considering a setup where all of the important data is local: you have some batch files on the local system, the job's purpose is to convert some local files to other local files, and it's just that the program is in minikube. (Or, similarly, you're trying to develop your application and the source files are on your local system.) In this case Kubernetes, as a distributed, clustered container system, isn't the right tool. Running the application directly on your system is the best approach; you can simulate this with a docker run -v bind mount, but this is inconvenient and can lead to permission and environment problems.
(In theory you can use a hostPath volume too, and minikube has some support to mount a host directory into the VM. In practice, the setup required to do this is as complex as the rest of your Kubernetes setup combined, and it won't be portable to any other Kubernetes installation. I wouldn't attempt this.)
You can mount your local directory to your kubernetes Pod using hostPath. Your path c:\app-data on your Windows host should be represented as either /C/app-data or /host_mnt/c/app-data, depending on your Docker Desktop version as suggested in this comment.
You may also want to take a look at this answer.

Network File share access on GKE cluster - Windows node pool

We are Containerizing dotnet application on GKE cluster(Windows node-pool). We have a requirement, where multiple pods can access same shared space(persistent volume). Also it should support "ReadWriteMany" AccessMode. We have explored below option:
GCE Persistent disk accessed by Persistent volume.(It doesn't support ReadWriteMany. Only one pod can access the disk).
Network File Share(NFS). Currently not supported for windows node pools.
Filestore fits the solutions but expensive and managed by google.
We are looking other options to fit our requirement. Please help.
You are right by saying that NFS isn't yet supported on Windows, at least, not for the built-in client v4. So as long as there is no support for NFS v4, Kubernetes team could not start up this work in k8s. source
With this constraint, the only solution I can see remains the Filestore.
I've been trying to solve the same problem - accessing shared filesystem from 2 Windows pods (ASP.NET application on IIS + console application). I wasn't able to use the Filestore because it requires an NFSClient (Install-WindowsFeature NFS-Client) and I couldn't install it into the containers (during container build or runtime) since it requires a computer restart - maybe i'm missing sth here.
The options I've found:
If you need to create a simple temporary demo application that can run on single VM you can run both pods on a single instance, create a Persistent Disk, attach it to the instance with gcloud compute instances attach-disk, RDP into the instance, mount the disk and provide the disk to the pods as a hostPath.
That's the solution I'm using now.
Create an SMB share (on a separate VM or using a Docker container https://hub.docker.com/r/dperson/samba/ and access it from the pods using New-SmbMapping -LocalPath $shareletter -RemotePath $dhcpshare -Username $shareuser -Password $sharepasswd -Persistent $true. This solution worked for my console application but the web application couldn't access the files (even though I've set the application pool on IIS to run as Local System). The SMB could also be mounted from the instance using the New-SmbGlobalMapping - the flexvolume does that https://github.com/microsoft/K8s-Storage-Plugins/tree/master/flexvolume/windows. I haven't explored that option and I think it would have the same problem (IIS not seeing the files).
I think the best (most secure and reliable) solution would be to setup an Active Directory Domain Controller and SMB share on separate VM and provide access to it to the containers using gMSA: https://learn.microsoft.com/en-us/virtualization/windowscontainers/manage-containers/manage-serviceaccounts
https://kubernetes.io/docs/tasks/configure-pod-container/configure-gmsa/
That doesn't seem easy though.

Sharing files between pods

I have a service that generates a picture. Once it's ready, the user will be able to download it.
What is the recommended way to share a storage volume between a worker pod and a backend service?
In general the recommended way is "don't". While a few volume providers support multi-mounting, it's very hard to do that in a way that isn't sadmaking. Preferably use an external services like AWS S3 for hosting the actual file content and store references in your existing database(s). If you need a local equivalent, check out Minio for simple cases.
Personally i will not recommended it to do. better then that you two container side one pod if having dependency on each other. so if one pod goes fail that file manager also delete and create at particular time if needed

How to manage file uploads with GKE?

I'm trying to run an api (based on Symfony) with kubernetes thanks to Google Container Engine (GKE).
This API also allow user to store and download files, which are supposed to be saved somewhere.
I tried to run it with 1 replica, and noticed a downtime of the service during the creation of the new container. It looks like at least 2 replicas are needed to avoid downtime.
Taking that into consideration, I'm interested about these options :
A volume based on Google Persistent Disk. Would this mean that all my replicas would be on the same node ? (ReadWriteOnce access mode). If so, in case of a node failure, my service would not be available.
A volume based on Flocker (Backend Persistent Disk). What is the recommended way to install it on GKE ?
Is there another interesting option ? What would you recommend ?
Using GCS (like tex mentioned) is probably the simplest solution (and will be very fast from a GKE cluster). Here is an answer that may help.
If you have a specific need for local persistent storage, you can use Google Persistent Disks, but they can only be mounted as writable in one place.
Petsets (currently alpha) will provide better support for distributed persistent in-cluster storage, so you can also look into that if GCS doesn't work for you.