We have success creating the pods, services and replication controllers according to our project requirements. Now we are planning to setup persistence storage in AWS using Kubernetes. I have created the YAML file to create an EBS volume in AWS, it's working fine as expected. I am able to claim volume and successfully mount to my pod (this is for single replica only).
I am able to successfully create the file.Volume also creating but my Pods is going to pending state, volume still shows available state in aws. I am not able to see any error logs over there.
Storage file:
kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
name: mongo-ssd
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp2
Main file:
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: web2
spec:
selector:
matchLabels:
app: mongodb
serviceName: "mongodb"
replicas: 2
template:
metadata:
labels:
app: mongodb
annotations:
pod.alpha.kubernetes.io/initialized: "true"
spec:
containers:
- image: mongo
name: mongodb
ports:
- name: web2
containerPort: 27017
hostPort: 27017
volumeMounts:
- mountPath: "/opt/couchbase/var"
name: mypd1
volumeClaimTemplates:
- metadata:
name: mypd1
annotations:
volume.alpha.kubernetes.io/storage-class: mongo-ssd
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi
Kubectl version:
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.2", GitCommit:"5fa2db2bd46ac79e5e00a4e6ed24191080aa463b", GitTreeState:"clean", BuildDate:"2018-01-18T10:09:24Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.6", GitCommit:"6260bb08c46c31eea6cb538b34a9ceb3e406689c", GitTreeState:"clean", BuildDate:"2017-12-21T06:23:29Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
I can see you have used hostPort in your container. In this case, If you do not have more than one node in your cluster, One Pod will remain pending. Because It will not fit any node.
containers:
- image: mongo
name: mongodb
ports:
- name: web2
containerPort: 27017
hostPort: 27017
I am getting this error when I describe my pending Pod
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 27s (x7 over 58s) default-scheduler No nodes are available that match all of the predicates: PodFitsHostPorts (1).
HostPort in your container will be bind with your node. Suppose, you are using HostPort 10733, but another pod is already using that port, now you pod can't use that. So it will be in pending state. And If you have replica 2, and both pod is deployed in same node, they can't be started either.
So, you need to use a port as HostPort, that you can surely say that no one else is using.
Related
I am trying to use NFS volume in the same cluster I have deployed other k8s services. But one of the services using the NFS fails with
Output: mount.nfs: mounting nfs.default.svc.cluster.local:/opt/shared-shibboleth-idp failed, reason given by server: No such file or directory
The nfs PV
apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
nfs:
server: nfs.default.svc.cluster.local # nfs is from svc {{ include "nfs.name" .}}
path: "/opt/shared-shibboleth-idp"
Description of nfs service
➜ helm git:(ft-helm) ✗ kubectl describe svc nfs
Name: nfs
Namespace: default
Labels: app=nfs
chart=nfs-1.0.0
heritage=Tiller
Annotations: <none>
Selector: role=nfs
Type: ClusterIP
IP: 10.19.251.72
Port: mountd 20048/TCP
TargetPort: 20048/TCP
Endpoints: 10.16.1.5:20048
Port: nfs 2049/TCP
TargetPort: 2049/TCP
Endpoints: 10.16.1.5:2049
Port: rpcbind 111/TCP
TargetPort: 111/TCP
Endpoints: 10.16.1.5:111
And the nfs deployment
➜ helm git:(ft-helm) ✗ kubectl describe replicationcontrollers telling-quoll-nfs
Name: telling-quoll-nfs
Namespace: default
Selector: role=nfs
Labels: app=nfs
chart=nfs-1.0.0
heritage=Tiller
Annotations: <none>
Replicas: 1 current / 1 desired
Pods Status: 1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: role=nfs
Containers:
nfs:
Image: k8s.gcr.io/volume-nfs:0.8
Ports: 20048/TCP, 2049/TCP, 111/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Environment: <none>
Mounts:
/exports from nfs (rw)
Volumes:
nfs:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: nfs-pv-provisioning-demo
ReadOnly: false
Events: <none>
And where it is being used
volumeMounts:
# names must match the volume names below
- name: RELEASE-NAME-shared-shib
mountPath: "/opt/shared-shibboleth-idp"
;
;
volumes:
- name: RELEASE-NAME-shared-shib
persistentVolumeClaim:
claimName: nfs
;
;
k8s version
➜ helm git:(ft-helm) ✗ kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-20T04:49:16Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.7-gke.8", GitCommit:"7d3d6f113e933ed1b44b78dff4baf649258415e5", GitTreeState:"clean", BuildDate:"2019-06-19T16:37:16Z", GoVersion:"go1.11.5b4", Compiler:"gc", Platform:"linux/amd64"}
As mentioned in the comments made by Patrick W and damitj07:
You need have to create the folder or dir manually before try to mount, otherwise the Kubernetes will raise an error because the destination directory not exist.
UPDATE Finally this is nothing to do with Azure File Share. It is actually the same case with Azure Disk and NFS or HostPath
I have mounted an Azure file Shares volume to a mongoDb pod with the mountPath /data. Everything seems to work as expected. When I exec into the pod, I can see mongo data in /data/db. But on the Azure File Shares I can only see the folders /db and /dbconfig, not the files. Any idea ? I have granted the permission 0777 to the volume.
This is my yaml files
StorageClass
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: azurefile
provisioner: kubernetes.io/azure-file
mountOptions:
- dir_mode=0777
- file_mode=0777
- uid=999
- gid=999
parameters:
storageAccount: ACCOUNT_NAME
skuName: Standard_LRS
PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: azurefile
spec:
accessModes:
- ReadWriteMany
storageClassName: azurefile
resources:
requests:
storage: 20Gi
Mongo deployement file
apiVersion: apps/v1beta2
kind: Deployment
metadata:
name: mongo
labels:
app: mongo
namespace: development
spec:
replicas: 1
selector:
matchLabels:
app: mongo
template:
metadata:
labels:
app: mongo
spec:
containers:
- name: mongo
image: "mongo"
imagePullPolicy: IfNotPresent
ports:
- containerPort: 27017
protocol: TCP
volumeMounts:
- mountPath: /data
name: mongovolume
subPath: mongo
imagePullSecrets:
- name: secret-acr
volumes:
- name: mongovolume
persistentVolumeClaim:
claimName: azurefile
Kubernetes version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:17:39Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.6", GitCommit:"a21fdbd78dde8f5447f5f6c331f7eb6f80bd684e", GitTreeState:"clean", BuildDate:"2018-07-26T10:04:08Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Solved the problem by changing mongo image for docker.io/bitnami/mongodb:4.0.2-debian-9. With this image, mongo data is written on the file share and the data is now persistant
This setup doesn't work with Azure Files nor with Azure Disks.
I am working on one of the projects where I faced similar kind of issue and took Azure support but they don't have any specific resolution for the same.
Root cause provided by Azure Support : The data/files which doesn't remain persistent are the ones which has ownership of mongodb.
As the GCE Disk does not support ReadWriteMany , I have no way to apply change to Deployment but being stucked at ContainerCreating with FailedAttachVolume .
So here's my setting:
1. PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mysql-pv-claim
labels:
app: mysql
spec:
storageClassName: "standard"
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
2. Service
apiVersion: v1
kind: Service
metadata:
name: mysql
labels:
app: mysql
spec:
type: ClusterIP
ports:
- protocol: TCP
port: 3306
targetPort: 3306
selector:
app: mysql
3. Deployment
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
name: mysql
labels:
app: mysql
spec:
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- image: mysql/mysql-server
name: mysql
ports:
- containerPort: 3306
name: mysql
volumeMounts:
- name: mysql-persistent-storage
mountPath: /mysql-data
volumes:
- name: mysql-persistent-storage
persistentVolumeClaim:
claimName: mysql-pv-claim
Which these are all fine for creating the PVC, svc and deployment. Pod and container started successfully and worked as expected.
However when I tried to apply change by:
kubectl apply -f mysql_deployment.yaml
Firstly, the pods were sutcked with the existing one did not terminate and the new one would be creating forever.
NAME READY STATUS RESTARTS AGE
mysql-nowhash 1/1 Running 0 2d
mysql-newhash 0/2 ContainerCreating 0 15m
Secondly from the gCloud console, inside the pod that was trying to create, I got two crucial error logs:
1 of 2 FailedAttachVolume
Multi-Attach error for volume "pvc-<hash>" Volume is already exclusively attached to one node and can't be attached to another FailedAttachVolume
2 of 2 FailedMount
Unable to mount volumes for pod "<pod name and hash>": timeout expired waiting for volumes to attach/mount for pod "default"/"<pod name and hash>". list of unattached/unmounted volumes=[mysql-persistent-storage]
What I could immediately think of is the ReadWriteOnce capability of gCloud PV. Coz the kubernetes engine would create a new pod before terminating the existing one. So under ReadWriteOnce it can never create a new pod and claim the existing pvc...
Any idea or should I use some other way to perform deployment updates?
appreciate for any contribution and suggestion =)
remark: my current work-around is to create an interim NFS pod to make it like a ReadWriteMany pvc, this worked but sounds stupid... requiring an additional storage i/o overhead to facilitate deployment update ?.. =P
The reason is that if you are applying UpdateStrategy: RollingUpdate (as it is default) k8s waits for the new Container to become ready before shutting down the old one. You can change this behaviour by applying UpdateStrategy: Recreate
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy
In this repository https://github.com/mappedinn/kubernetes-nfs-volume-on-gke I am trying to share a volume through NFS service on GKE. The NFS file sharing is successful if hard coded IP address is used.
But, in my point of view, it would be better to use DNS name in stead of hard coded IP address.
Below is the declaration of the NFS service being used for sharing a volume in Google Cloud Platform:
apiVersion: v1
kind: Service
metadata:
name: nfs-server
spec:
ports:
- name: nfs
port: 2049
- name: mountd
port: 20048
- name: rpcbind
port: 111
selector:
role: nfs-server
Below is the definition of the PersistentVolume with hard coded IP address:
apiVersion: v1
kind: PersistentVolume
metadata:
name: wp01-pv-data
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteMany
nfs:
server: 10.247.248.43 # with hard coded IP, it works
path: "/"
Below is the definition of the PersistentVolume with DNS name:
apiVersion: v1
kind: PersistentVolume
metadata:
name: wp01-pv-data
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteMany
nfs:
server: nfs-service.default.svc.cluster.local # with DNS, it does not works
path: "/"
I am using this https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/ for getting the DNS of the service. Is there any thing missed?
Thanks
The problem is in DNS resolution on node it self. Mounting of the NFS share to the pod is a job of kubelet that is launched on the node. Hence the DNS resolution happens according to /etc/resolv.conf on the node it self as well. What could suffice is adding a nameserver <your_kubedns_service_ip> to the nodes /etc/resolv.conf, but it can become somewhat chicken-and-egg problem in some corner cases
I solved the problem by just upgrading kubectl of my GKE cluster from the version 1.7.11-gke.1 to 1.8.6-gke.0.
kubectl version
# Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.6", GitCommit:"6260bb08c46c31eea6cb538b34a9ceb3e406689c", GitTreeState:"clean", BuildDate:"2017-12-21T06:34:11Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
# Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.6-gke.0", GitCommit:"ee9a97661f14ee0b1ca31d6edd30480c89347c79", GitTreeState:"clean", BuildDate:"2018-01-05T03:36:42Z", GoVersion:"go1.8.3b4", Compiler:"gc", Platform:"linux/amd64"}
Actually, this the final version of yaml files:
apiVersion: v1
kind: Service
metadata:
name: nfs-server
spec:
# clusterIP: 10.3.240.20
ports:
- name: nfs
port: 2049
- name: mountd
port: 20048
- name: rpcbind
port: 111
selector:
role: nfs-server
# type: "LoadBalancer"
and
apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs
spec:
capacity:
storage: 1Mi
accessModes:
- ReadWriteMany
nfs:
# FIXED: Use internal DNS name
server: nfs-server.default.svc.cluster.local
path: "/"
I'm running a k8s cluster on Google GKE where I have a statefulsets running Redis and ElasticSearch.
So every now and then the pods end up in a completed state and so they aren't running anymore and my services depending on it fail.
These pods will also never restart by themselves, a simple kubectl delete pod x will resolve the problem but I want my pods to heal by themselves.
I'm running the latest version available 1.6.4, I have no clue why they aren't pickup and restarted like any other regular pod. Maybe I'm missing something obvious.
edit: I've also notice the pod get a termination signal and shuts down properly so I'm wondering where that is coming from. I'm not manually shutting down and I experience the same with ElasticSearch
This is my statefulset resource declaration:
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: redis
spec:
serviceName: "redis"
replicas: 1
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:3.2-alpine
ports:
- name: redis-server
containerPort: 6379
volumeMounts:
- name: redis-storage
mountPath: /data
volumeClaimTemplates:
- metadata:
name: redis-storage
annotations:
volume.alpha.kubernetes.io/storage-class: anything
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi
Check the version of docker you run, and whether the docker daemon was restarted during that time.
If the docker daemon was restarted, all the container would be terminated (unless you use the new "live restore" feature in 1.12). In some docker versions, docker may incorrectly reports "exit code 0" for all containers terminated in this situation. See https://github.com/docker/docker/issues/31262 for more details.
source: https://stackoverflow.com/a/43051371/5331893
I am using same configuration as you but removing the annotation in the volumeClaimTemplates since I am trying this on minikube:
$ cat sc.yaml
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: redis
spec:
serviceName: "redis"
replicas: 1
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:3.2-alpine
ports:
- name: redis-server
containerPort: 6379
volumeMounts:
- name: redis-storage
mountPath: /data
volumeClaimTemplates:
- metadata:
name: redis-storage
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi
Now trying to simulate the case where redis fails, so execing into the pod and killing the redis server process:
$ k exec -it redis-0 sh
/data # kill 1
/data # $
See that immediately after the process dies I can see that the STATUS has changed to Completed:
$ k get pods
NAME READY STATUS RESTARTS AGE
redis-0 0/1 Completed 1 38s
It took some time for me to get the redis up and running:
$ k get pods
NAME READY STATUS RESTARTS AGE
redis-0 1/1 Running 2 52s
But soon after that I could see it starting the pod, can you see the events triggered when this happened? Like was there a problem when re-attaching the volume to the pod?