Multiple Kubernetes pods sharing the same host-path/pvc will duplicate output - kubernetes

I have a small problem and need to know what is the best way to approach this/solve my issue.
I have deployed few pods on Kubernetes and so far I have enjoyed learning about and working with Kubernetes. Did all the persistent volume, volume claim...etc. and can see my data on the host, as I need those files for further processing.
Now the issue is 2 pods (2 replicas) sharing the same volume claim are writing to the same location on the host, expected, but unfortunately causing the data to be duplicated in the output file.
What I need is:
To have a unique output of each pod on the host. Is the only way to achieve this is by
having two deployment files, in my case, and each to use a different volume claim/persistent
volume ? At the same time not sure if this is an optimal approach for future updates, upgrades, availability of certain number of pods ... etc.
Or can I still have one deployment file with 2 or more replicas and still avoid the output duplication when sharing the same pvc ?
Please note that I have one node deployment and that's why I'm using hostpath at the moment.
creating pv:
kind: PersistentVolume
apiVersion: v1
metadata:
name: ls-pv
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 100Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/ls-data/my-data2"
claim-pv:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: ls-pv-claim
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
How I use my pv inside my deployment:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: logstash
namespace: default
labels:
component: logstash
spec:
replicas: 2
selector:
matchLabels:
component: logstash
#omitted
ports:
- containerPort: 5044
name: logstash-input
protocol: TCP
- containerPort: 9600
name: transport
protocol: TCP
volumeMounts:
- name: ls-pv-store
mountPath: "/logstash-data"
volumes:
- name: ls-pv-store
persistentVolumeClaim:
claimName: ls-pv-claim

Depending on what exactly you are trying to achieve you could use Statefulsets instead of Deployments. Each Pod spawn from the Statefulset's Pod template can have it's own separate PersistentVolumeClaim that is created from the volumeClaimTemplate (see the link for an example). You will need a StorageClass set up for this.
If you are looking for something simpler you write to /mnt/volume/$HOSTNAME from each Pod. This will also ensure that they are using separate files as the hostnames for the Pods are unique.

Related

How to mounts a directory from container into the host

I create a deployment yaml for a microservice.
I am using hostpath volume type for persistentVolume and I have to copy data to a path in host. But I want to mount a directory from container into the host because data is in the container and I need this data in host.
My deployment yaml:
#create persistent volume
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-vol
spec:
storageClassName: manual
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: /opt/storage/app
#create persistent volume clame
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: app-pv-claim
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
#create Deployment
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-deployment
spec:
replicas: 1
selector:
matchLabels:
deploy: app
template:
metadata:
labels:
deploy: app
spec:
hostname: app
hostNetwork: false
containers:
- name: app
image: 192.168.10.10:2021/project/app:latest
volumeMounts:
- mountPath: /opt/app
name: project-volume
volumes:
- name: project-volume
persistentVolumeClaim:
claimName: app-pv-claim
Due to information gaps, I am writing a general answer.
First of all you should know:
HostPath volumes present many security risks, and it is a best practice to avoid the use of HostPaths when possible. When a HostPath volume must be used, it should be scoped to only the required file or directory, and mounted as ReadOnly.
But the use of hostPath also offers a powerful escape hatch for some applications.
If you still want to use it, firstly you should check if both pods (the one that created the data and the second one that want to access the data) are on the same node. The following command will show you that.
kubectl get pods -o wide
All data created by any of pods should stay in hostPath directory and be available for every pod as long as they are running on the same node.
See also this documentation about hostPath.

Kubernetes - For Scale, pod is pending when attached the persistent volumes while scaling the pod (GKE)

I have created a deployment in the xyz-namespace namespace, it has PVC. I can create the deployment and able to access it. It is working properly but while scale the deployment from the Kubernetes console then the pod is pending state only.
persistent_claim:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: jenkins
spec:
accessModes:
- ReadWriteOnce
storageClassName: standard
resources:
requests:
storage: 5Gi
namespace: xyz-namespace
and deployment object is like below.
apiVersion: apps/v1
kind: Deployment
metadata:
name: db-service
labels:
k8s-app: db-service
Name:db-service
ServiceName: db-service
spec:
selector:
matchLabels:
tier: data
Name: db-service
ServiceName: db-service
strategy:
type: Recreate
template:
metadata:
labels:
app: jenkins
tier: data
Name: db-service
ServiceName: db-service
spec:
hostname: jenkins
initContainers:
- command:
- "/bin/sh"
- "-c"
- chown -R 1000:1000 /var/jenkins_home
image: busybox
imagePullPolicy: Always
name: jenkins-init
volumeMounts:
- name: jenkinsvol
mountPath: "/var/jenkins_home"
containers:
- image: jenkins/jenkins:lts
name: jenkins
ports:
- containerPort: 8080
name: jenkins1
- containerPort: 8080
name: jenkins2
volumeMounts:
- name: jenkinsvol
mountPath: "/var/jenkins_home"
volumes:
- name: jenkinsvol
persistentVolumeClaim:
claimName: jenkins
nodeSelector:
nodegroup: xyz-testing
namespace: xyz-namespace
replicas: 1
Deployment is created fine and working as well but
When I am trying to Scale the deployment from console then the pod is getting stuck and it's pending state only.
If I removed the persistent volume and then scaled it then it is working fine, but with persistent volume, it is not working.
When using standard storage class I assume you are using the default GCEPersisentDisk Volume PlugIn. In this case you cannot set them at all as they are already set by the storage provider (GCP in your case, as you are using GCE perisistent disks), these disks only support ReadWriteOnce(RWO) and ReadOnlyMany (ROX) access modes. If you try to create a ReadWriteMany(RWX) PV that will never come in a success state (your case when set the PVC with accessModes: ReadWriteMany).
Also if any pod tries to attach a ReadWriteOnce volume on some other node, you’ll get following error:
FailedMount Failed to attach volume "pv0001" on node "xyz" with: googleapi: Error 400: The disk resource 'abc' is already being used by 'xyz'
References from above on this article
As mentioned here and here, NFS is the easiest way to get ReadWriteMany as all nodes need to be able to ReadWriteMany to the storage device you are using for your pods.
Then I would suggest you to use an NFS storage option. In case you want to test it, here is a good guide by Google using its Filestore solution which are fully managed NFS file servers.
Your PersistentVolumeClaim is set to:
accessModes:
- ReadWriteOnce
But it should be set to:
accessModes:
- ReadWriteMany
The ReadWriteOnce access mode means, that
the volume can be mounted as read-write by a single node [1].
When you scale your deployment it's most likely scaled to different nodes, therefore you need ReadWriteMany.
[1] https://kubernetes.io/docs/concepts/storage/persistent-volumes/

MongoDB kubernetes local storage two nodes

I am using kubeadm localy at two physical machines. I don't have any cloud resources, and i want to build a mongodb auto scaling (localy for start, maybe later at cloud). So i have to use the local storage of my two physical machines. I suppose i have to create a local storage class and volumes. I am very new to kubernetes so dont judge me hard. As i read here https://kubernetes.io/blog/2019/04/04/kubernetes-1.14-local-persistent-volumes-ga/ local persisent volumes are only for one node? Is there any way to take advance of my both physical machines storages and build a simple mongo db scaling, using kubernetes mongo operator and ops manager? I made a few tests here, but i could achieve my goal. pod has unbound immediate PersistentVolumeClaims ops manager
What i was thinking in first place, was to "break" my two hard drives into many piecies, and use sharding for mongo dv scaling
thanks in advace.
Well, you can use a NFS Server with the same volume mounted in both nodes sharing the same mount point.
Please be aware this approach is not recommended for production.
There are tons of howtos of how configure nfs server, example:
https://www.tecmint.com/install-nfs-server-on-ubuntu/
https://www.tecmint.com/how-to-setup-nfs-server-in-linux/
With NFS working you can use the hostPath to mount the nfs in you pods:
Create the PV and the PVC:
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-volume
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/nfs/data"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pv-claim
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 3Gi
And use the volume in your deployment file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-pv
spec:
replicas: 1
selector:
matchLabels:
app: test-pv
template:
metadata:
labels:
app: test-pv
spec:
containers:
- image: nginx
name: nginx
volumeMounts:
- mountPath: /data
name: pv-storage
volumes:
- name: pv-storage
persistentVolumeClaim:
claimName: pv-claim

Kubernetes - Generate files on all the pods

I have Java API which exports the data to an excel and generates a file on the POD where the request is served.
Now the next request (to download the file) might go to a different POD and the download fails.
How do I get around this?
How do I generate files on all the POD? Or how do I make sure the subsequent request goes to the same POD where file was generated?
I cant give the direct POD URL as it will not be accessible to clients.
Thanks.
You need to use a persistent volumes to share the same files between your containers. You could use the node storage mounted on containers (easiest way) or other distributed file system like NFS, EFS (AWS), GlusterFS etc...
If you you need a simplest to share the file and your pods are in the same node, you could use hostpath to store the file and share the volume with other containers.
Assuming you have a kubernetes cluster that has only one Node, and you want to share the path /mtn/data of your node with your pods:
Create a PersistentVolume:
A hostPath PersistentVolume uses a file or directory on the Node to emulate network-attached storage.
apiVersion: v1
kind: PersistentVolume
metadata:
name: task-pv-volume
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data"
Create a PersistentVolumeClaim:
Pods use PersistentVolumeClaims to request physical storage
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: task-pv-claim
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 3Gi
Look at the PersistentVolumeClaim:
kubectl get pvc task-pv-claim
The output shows that the PersistentVolumeClaim is bound to your PersistentVolume, task-pv-volume.
NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE
task-pv-claim Bound task-pv-volume 10Gi RWO manual 30s
Create a deployment with 2 replicas for example:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
volumes:
- name: task-pv-storage
persistentVolumeClaim:
claimName: task-pv-claim
containers:
- name: task-pv-container
image: nginx
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/mnt/data"
name: task-pv-storage
Now you can check inside both container the path /mnt/data has the same files.
If you have cluster with more than 1 node I recommend you to think about the other types of persistent volumes.
References:
Configure persistent volumes
Persistent volumes
Volume Types

How to reattach released PersistentVolume in Kubernetes

Here is my overall goal:
Have a MongoDB running
Persist the data through pod failures / updates etc
The approach I’ve taken:
K8S Provider: Digital Ocean
Nodes: 3
Create a PVC
Create a headless Service
Create a StatefulSet
Here’s a dumbed down version of the config:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: some-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: do-block-storage
---
apiVersion: v1
kind: Service
metadata:
name: some-headless-service
labels:
app: my-app
spec:
ports:
- port: 27017
name: my-app-database
clusterIP: None
selector:
app: my-app
tier: database
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: my-app-database
labels:
app: my-app
tier: database
spec:
serviceName: some-headless-service
replicas: 1
selector:
matchLabels:
app: my-app
tier: database
template:
metadata:
labels:
app: my-app
tier: database
spec:
containers:
- name: my-app-database
image: mongo:latest
volumeMounts:
- name: some-volume
mountPath: /data
ports:
- containerPort: 27017
name: my-app-database
volumes:
- name: some-volume
persistentVolumeClaim:
claimName: some-pvc
This is working as expected. I can spin down the replicas to 0:
kubectl scale —replicas=0 statefulset/my-app-database
Spin it back up:
kubectl scale —replicas=1 statefulset/my-app-database
And the data will persist..
But one time, as I was messing around by scaling the statefulset up and down, I was met with this error:
Volume is already exclusively attached to one node and can't be attached to another
Being new to k8s, I deleted the PVC and “recreated” the same one:
kubectl delete pvc some-pvc
kubectl apply -f persistent-volume-claims/
The statefulset spun back up with a new PV and the old PV was deleted as the persistentVolumeReclaimPolicy was set to Delete by default.
I set this new PV persistentVolumeReclaimPolicy to Retain to ensure that the data would not be automatically removed.. and I realized: I’m not sure how I’d reclaim that PV. Earlier to get through the “volume attachment” error, I deleted the PVC, which will just create another new PV with the setup I have, and now I’m left with my data in that Released PV.
My main questions are:
Does this overall sound like the right approach for my goal?
Should I look into adding a claimRef to the dynamically created PV and then recreating a new PVC with that claimRef, as mentioned here: Can a PVC be bound to a specific PV?
Should I be trying to get that fresh statefulset PVC to actually use that old PV?
Would it make sense to try to reattach the old PV to the correct node, and how would I do that?
If your want to use StatefulSet with scalability, your storage should also support this, there are two way to handle this:
If do-block-storage storage class supprt ReadWriteMany, then put all pod's data in single volume.
Each pod use a different volume, add volumeClaimTemplate to your StatefulSet.spec,
then k8s will create PVC like some-pvc-{statefulset_name}-{idx} automatically:
spec:
volumeClaimTemplates:
- metadata:
name: some-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: do-block-storage
Update:
StatefulSet replicas Must deploy with mongodb replication, then each pod in StatefulSet will has same data storage.
So when container run mongod command, you must add option --replSet={name}. when all pods up, execute command rs.initiate() to tell mongodb how to handle data replication. When you scale up or down StatefulSet, execute command rs.add() or rs.remove() to tell mongodb members has changed.