How to manage file uploads with GKE? - kubernetes

I'm trying to run an api (based on Symfony) with kubernetes thanks to Google Container Engine (GKE).
This API also allow user to store and download files, which are supposed to be saved somewhere.
I tried to run it with 1 replica, and noticed a downtime of the service during the creation of the new container. It looks like at least 2 replicas are needed to avoid downtime.
Taking that into consideration, I'm interested about these options :
A volume based on Google Persistent Disk. Would this mean that all my replicas would be on the same node ? (ReadWriteOnce access mode). If so, in case of a node failure, my service would not be available.
A volume based on Flocker (Backend Persistent Disk). What is the recommended way to install it on GKE ?
Is there another interesting option ? What would you recommend ?

Using GCS (like tex mentioned) is probably the simplest solution (and will be very fast from a GKE cluster). Here is an answer that may help.
If you have a specific need for local persistent storage, you can use Google Persistent Disks, but they can only be mounted as writable in one place.
Petsets (currently alpha) will provide better support for distributed persistent in-cluster storage, so you can also look into that if GCS doesn't work for you.

Related

How can I use GCP NFS Filestore on k8 cluster with TPUs?

I'm using GKE to run K8 workloads and want to add TPU support. From GCP docs, I "need" to attach a GCS bucket so the Job can read models and store logs. However, we already create shared NSF mounts for our k8 clusters. How hard of a requirement is it to "need" GCS to use TPUs? Can shared Filestore NFS mounts work just fine? What about using GCS Fuse?
I'm trying to avoid having the cluster user know about the back end file system (NFS vs GCS), and just know that that the files they provide will be available at "/home/job". Since the linked docs show passing a gs://mybucket/some/path value as needed for file system parameters, I'm not sure if a /home/job value will still work. Does the TPU access the filesystem directly and is only compatible with GCS? Or do the nodes access the filesystem (preferring GCS) and then share the data (in memory) with the TPUs?
I'll try it out to learn the hard way (and report back), but curious if others have experience with this already.

Kubernetes cluster Mysql Nodes Storage

We have started setting up a Kubernetes cluster. On Production, we have 4 Mysql Nodes(2 Active Master, 2 Active slaves). Complete servers are on-premise, There is NO cloud providers usage.
Now how do I configure storage? I mean should I use PV / PVC? How will it work. Should I use local PV? Can someone explain to me this?
You need to use PersistentVolumes and PersistentVolumeClaims in order to achieve that.
A PersistentVolume (PV) is a piece of storage in the cluster that has
been provisioned by an administrator or dynamically provisioned using
Storage Classes.
A PersistentVolumeClaim (PVC) is a request for storage by a user.
Claims can request specific size and access modes (e.g., they can be
mounted once read/write or many times read-only).
Containers are ephemeral. When the container is restarted all the changes made prior to it are lost. Databases, however expect the data is persistent, therefore you need persistent volumes. You have to create a storage claim and the pod must be configured to mount the claimed storage.
Here you will find a simple guide showing how to deploy MySQL with a PersistentVolume. However, I strongly recommend getting familiar with the official docs that I have linked in order to fully understand the concept and adjust the access mode, class, size, etc according to your needs.
Please let me know if that helped.

Is it Appropriate to Store Database in a Kubernetes Persistent Volume (And how to back up?)

I have a web application running on a Google Kubernetes cluster. My web app also uses persistent volumes for multiple MongoDB databases to store user and application data.
(1) Thus I am wondering if it is practical to store all data inside those persistent volumes in the long-run?
(2) Are there any methods for safely backing up the persistent volumes e.g. on a weekly basis (automatically)?
(3) I am also planning to integrate some kind of file upload into the application. Are persistent volumes capable of storing many GB/TB of data, or should I opt for something like Google cloud storage in this case?
Deploying statefull apps on K8s is bit painfull which is well known in K8s community. Usually, if we need HA for DBs supposed to deploy as cluster mode. But in K8s, if you want to deploy in cluster mode, you need to check StatefulSets concept. Anyways, I'm pasting links for your questions, so that you can start from there.
(1) Thus I am wondering if it is practical to store all data inside
those persistent volumes in the long-run?
Running MongoDB on Kubernetes with StatefulSets
(2) Are there any methods for safely backing up the persistent volumes
e.g. on a weekly basis (automatically)?
Persistent Volume Snapshots
Volume Snapshot (Beta from K8s docs)
You can google even more docs.
(3) I am also planning to integrate some kind of file upload into the
application. Are persistent volumes capable of storing many GB/TB of
data, or should I opt for something like Google cloud storage in this
case?
Not sure, it can hold TBs!?? but definitely, if you have cloud, consider to use it
Yes you can use the PVC in Kubernetes to store the data. However it's depends on your application usecase and size.
In kubernetes you can deploy Mongo DB as cluster and run it which is storing data inside PVC.MongoDB helm chart available for HA you can also look for that.
Helm chart : https://github.com/helm/charts/tree/master/stable/mongodb
It's suggested to single pod or statefulset of MongoDB on Kubernetes.
Backup:
For backup of MongoDB database, you can choose taking a snapshot of disk storage (PVC) weekly however along with that you can alos use Mongo snapshot.
Most people choose to manage service but still, it depends on your organization also.
Backup method
MongoDB snapshot
Disk storage snapshot
Filesystem :
Yes it can handle TB of data as it's ultimately disk volume or file
system.
Yes you can use PVC as file system but later in future you may get issue for scaling as PVC is ReadWriteOnce if you want to scale application along with PVC you have to implement ReadWriteMany.
There is sevral method also to achive this you can also directly mount file system to pod like AWS EFS but you may find it slow for file operations.
For file system there are various options available in Kubernetes like csi driver, gluster FS, minio, EFS.

Sharing files between pods

I have a service that generates a picture. Once it's ready, the user will be able to download it.
What is the recommended way to share a storage volume between a worker pod and a backend service?
In general the recommended way is "don't". While a few volume providers support multi-mounting, it's very hard to do that in a way that isn't sadmaking. Preferably use an external services like AWS S3 for hosting the actual file content and store references in your existing database(s). If you need a local equivalent, check out Minio for simple cases.
Personally i will not recommended it to do. better then that you two container side one pod if having dependency on each other. so if one pod goes fail that file manager also delete and create at particular time if needed

Kubernetes - EBS for storage

I have a question regarding what is the best approach with K8S in AWS.
the way I see it that either I use the EBS directly for PV AND PVC or that I mount the EBS as a regular folder in my EC2 and then use those mounted folders for my PV and PVC.
what approach is better in your opinion?
it is important to notice that I want my K8s to Cloud agnostic so maybe forcing EBS configuration is less better that using a folder so the ec2 does not care what is the origin of the folder.
many thanks
what approach is better in your opinion?
Without question: using the PV and PVC. Half the reason will go here, and the other half below. By declaring those as managed resources, kubernetes will cheerfully take care of attaching the volumes to the Node it is scheduling the Pod upon, and detaching it from the Node when the Pod is unscheduled. That will matter in a huge way if a Node reboots, for example, because the attach-detach cycle will happen transparently, no Pager Duty involved. That will not be true if you are having to coordinate amongst your own instances who is alive and should have the volume attached at this moment.
it is important to notice that I want my K8s to Cloud agnostic so maybe forcing EBS configuration is less better that using a folder so the ec2 does not care what is the origin of the folder.
It still will be cloud agnostic, because what you have told kubernetes -- declaratively, I'll point out, using just text in a yaml file -- is that you wish for some persistent storage to be volume mounted into your container(s) before they are launched. Only drilling down into the nitty gritty will surface the fact that it's provided by an AWS EBS volume. I would almost guarantee you could move those descriptors over to GKE (or Azure's thing) with about 90% of the text exactly the same.