Kubernetes statefulset ends in completed state - kubernetes

I'm running a k8s cluster on Google GKE where I have a statefulsets running Redis and ElasticSearch.
So every now and then the pods end up in a completed state and so they aren't running anymore and my services depending on it fail.
These pods will also never restart by themselves, a simple kubectl delete pod x will resolve the problem but I want my pods to heal by themselves.
I'm running the latest version available 1.6.4, I have no clue why they aren't pickup and restarted like any other regular pod. Maybe I'm missing something obvious.
edit: I've also notice the pod get a termination signal and shuts down properly so I'm wondering where that is coming from. I'm not manually shutting down and I experience the same with ElasticSearch
This is my statefulset resource declaration:
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: redis
spec:
serviceName: "redis"
replicas: 1
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:3.2-alpine
ports:
- name: redis-server
containerPort: 6379
volumeMounts:
- name: redis-storage
mountPath: /data
volumeClaimTemplates:
- metadata:
name: redis-storage
annotations:
volume.alpha.kubernetes.io/storage-class: anything
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi

Check the version of docker you run, and whether the docker daemon was restarted during that time.
If the docker daemon was restarted, all the container would be terminated (unless you use the new "live restore" feature in 1.12). In some docker versions, docker may incorrectly reports "exit code 0" for all containers terminated in this situation. See https://github.com/docker/docker/issues/31262 for more details.
source: https://stackoverflow.com/a/43051371/5331893

I am using same configuration as you but removing the annotation in the volumeClaimTemplates since I am trying this on minikube:
$ cat sc.yaml
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: redis
spec:
serviceName: "redis"
replicas: 1
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:3.2-alpine
ports:
- name: redis-server
containerPort: 6379
volumeMounts:
- name: redis-storage
mountPath: /data
volumeClaimTemplates:
- metadata:
name: redis-storage
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi
Now trying to simulate the case where redis fails, so execing into the pod and killing the redis server process:
$ k exec -it redis-0 sh
/data # kill 1
/data # $
See that immediately after the process dies I can see that the STATUS has changed to Completed:
$ k get pods
NAME READY STATUS RESTARTS AGE
redis-0 0/1 Completed 1 38s
It took some time for me to get the redis up and running:
$ k get pods
NAME READY STATUS RESTARTS AGE
redis-0 1/1 Running 2 52s
But soon after that I could see it starting the pod, can you see the events triggered when this happened? Like was there a problem when re-attaching the volume to the pod?

Related

Kubernetes Persistent Volume Claim doesn't save the data

I made a persistent volume claim on kubernetes to save mongodb data after restarting the deployment I found that data is not existed also my PVC is in bound state.
here is my yaml file
apiVersion: apps/v1
kind: Deployment
metadata:
name: auth-mongo-depl
spec:
replicas: 1
selector:
matchLabels:
app: auth-mongo
template:
metadata:
labels:
app: auth-mongo
spec:
volumes:
- name: auth-mongo-data
persistentVolumeClaim:
claimName: auth-mongo-pvc
containers:
- name: auth-mongo
image: mongo
ports:
- containerPort: 27017
name: 'auth-mongo-port'
volumeMounts:
- name: auth-mongo-data
mountPath: '/data/db'
---
# Persistent Volume Claim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: auth-mongo-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 500Mi
and I made clusterIP service for the deployment
First off, if the PVC status is still Bound and the desired pod happens to start on another node, it will fail as the PV can't be mounted into the pod. This happens because the reclaimPolicy: Retain of the StorageClass (can also be set on the PV directly persistentVolumeReclaimPolicy: Retain). In order to fix this, you have to manually overrite/delete the claimRef of the PV. Use kubectl patch pv PV_NAME -p '{"spec":{"claimRef": null}}' to do this, after doing so the PV's status should be Available.
In order to see if the your application writes any data to the desired path, run your application and exec into it (kubectl -n NAMESPACE POD_NAME -it -- /bin/sh) and check your /data/db. You could also create an file with some random text, restart your application and check again.
I'm fairly certain that if your PV isn't being recreated every time your application starts (which shouldn't be the case, because of Retain), then it's highly that your Application isn't writing to the path specified. But you could also share your PersistentVolume config with us, as there might be some misconfiguration there as well.

Kubernetes mount volume keeps timeing out even though volume can be mounted from sudo mount

I have a read only persistent volume that I'm trying to mount onto the statefulset, but after making some changes to the program and re-creating the pods, the pod can now no longer mount to the volume.
PV yaml file:
apiVersion: v1
kind: PersistentVolume
metadata:
name: foo-pv
spec:
capacity:
storage: 2Gi
accessModes:
- ReadOnlyMany
nfs:
server: <ip>
path: "/var/foo"
claimRef:
name: foo-pvc
namespace: foo
PVC yaml file:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: foo-pvc
namespace: foo
spec:
accessModes:
- ReadOnlyMany
storageClassName: ""
volumeName: foo-pv
resources:
requests:
storage: 2Gi
Statefulset yaml:
apiVersion: v1
kind: Service
metadata:
name: foo-service
spec:
type: ClusterIP
ports:
- name: http
port: 80
selector:
app: foo-app
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: foo-statefulset
namespace: foo
spec:
selector:
matchLabels:
app: foo-app
serviceName: foo-app
replicas: 1
template:
metadata:
labels:
app: foo-app
spec:
serviceAccountName: foo-service-account
containers:
- name: fooContainer
image: <image>
imagePullPolicy: Always
volumeMounts:
- name: writer-data
mountPath: <path>
- name: nfs-objectd
mountPath: <path>
volumes:
- name: nfs-foo
persistentVolumeClaim:
claimName: foo-pvc
volumeClaimTemplates:
- metadata:
name: writer-data
spec:
accessModes: [ "ReadWriteMany" ]
storageClassName: "foo-sc"
resources:
requests:
storage: 2Gi
k describe pod reports "Unable to attach or mount volumes: unmounted volumes=[nfs-foo]: timed out waiting for the condition". There is a firewall between the machine running kubernetes and the NFS, however the port has been unblocked, and the folder has been exported for mounting on the NFS side. Running sudo mount -t nfs :/var/foo /var/foo is able to successfully mount the NFS, so I don't understand why kuebernetes isn't about to mount it anymore. Its been stuck failing mount for several days now. Is there any other way to debug this?
Thanks!
Based on the error “unable to attach or mount volumes …….timed out waiting for condition”, there were some similar issues reported to the Product Team and it is a known issue. But, this error is more observed on the preemptible/spot nodes when the node is preempted. In similar occurrences of this issue for other users, upgrading the control plane version resolved this issue temporarily in preemptible/spot nodes.
Also, if you are not using any preemptible/spot nodes in your cluster, this issue might have happened when the old node is replaced by a new node. If you are still facing this issue, try upgrading the control plane to the same version i.e. you can execute the following command:
$ gcloud container clusters upgrade CLUSTER_NAME --master --zone ZONE --cluster-version VERSION
Another workaround to fix this issue would be remove the stale VolumeAttachment with the following command:
$ kubectl delete volumeattachment [volumeattachment_name]
After running the command and thus removing the VolumeAttachment, the pod should eventually pick up and retry. You can read more about this issue and its cause here.

CreateContainerError while creating postgresql in k8s

I'm trying to run postgresql db at k8s and there is no errors while creating all from file, but pod at the deployment cant create container.
There is my yaml code:
ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: postgres-config
labels:
app: postgres
data:
POSTGRES_DB: postgresdb
POSTGRES_USER: postgresadmin
POSTGRES_PASSWORD: adminpassword
Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres
spec:
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:10.18
imagePullPolicy: "IfNotPresent"
ports:
- containerPort: 5432
envFrom:
- configMapRef:
name: postgres-config
volumeMounts:
- mountPath: /var/lib/postgresql/data
name: postgredb
volumes:
- name: postgredb
persistentVolumeClaim:
claimName: postgres-pv-claim
Sevice:
apiVersion: v1
kind: Service
metadata:
name: postgres
labels:
app: postgres
spec:
type: ClusterIP
ports:
- port: 5432
selector:
app: postgres
after i'm using:
kubectl create -f filename
i got :
configmap/postgres-config created
persistentvolume/postgres-pv-volume created
persistentvolumeclaim/postgres-pv-claim created
deployment.apps/postgres created
service/postgres created
But when i'm typing:
kubectl get pods
There is an error:
postgres-78496cc865-85kt7 0/1 CreateContainerError 0 13m
this is PV and PVC, no more space at the question to ad that as a code :)
If you describe the pod, you'll see the warning message in there,
Warning FailedScheduling 45s (x2 over 45s) default-scheduler persistentvolumeclaim "postgres-pv-claim" not found
On a high level, a database instance can run within a Kubernetes container. A database instance stores data in files, and the files are stored in persistent volume claims. A PersistentVolumeClaim must be created and made available to a PostgreSQL instance.To create the database instance as a container, you use a deployment configuration. In order to provide an access interface that is independent of the particular container, you create a service that provides access to the database. The service remains unchanged even if a container (or pod) is moved to a different node.
In your case, create a PVC resource and bound it to PV so that will be used by the pod. As currently it does not found that , it went into pending state. This can be achieved in multiple ways, you can either use the hostPath as the local storage,
$ k get pods
NAME READY STATUS RESTARTS AGE
postgres-795cfcd67b-khfgn 1/1 Running 0 18s
Sample PV and PVC configs as below,
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-nautilus
spec:
storageClassName: manual
capacity:
storage: 8Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/home/mohan"
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-pv-claim
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
You can check the Persistent Volume doc for more details. Also, read more about storage class and StatefulSets for deploying database applications in Kubernetes cluster.
Thanks to all who tried to help me! Problem was at PersistentVolume.spec.hostPath.path. There was an invalid character at the path. I tried to use "./path".

Graceful scaledown of stateful apps in Kubernetes

I have a stateful application deployed in Kubernetes cluster. Now the challenge is how do I scale down the cluster in a graceful way so that each pod while terminating (during scale down) completes it’s pending tasks and then gracefully shuts-down. The scenario is similar to what is explained below but in my case the pods terminating will have few inflight tasks to be processed.
https://medium.com/#marko.luksa/graceful-scaledown-of-stateful-apps-in-kubernetes-2205fc556ba9 1
Do we have an official feature support for this from kubernetes api.
Kubernetes version: v1.11.0
Host OS: linux/amd64
CRI version: Docker 1.13.1
UPDATE :
Possible Solution - While performing a statefulset scale-down the preStop hook for the terminating pod(s) will send a message notification to a queue with the meta-data details of the resp. task(s) to be completed. Afterwards use a K8 Job to complete the tasks. Please do comment if the same is a recommended approach from K8 perspective.
Thanks In Advance!
Regards,
Balu
Your pod will be scaled down only after the in-progress job is completed. You may additionally configure the lifecycle in the deployment manifest with prestop attribute which will gracefully stop your application. This is one of the best practices to follow. Please refer this for detailed explanation and syntax.
Updated Answer
This is the yaml I tried to deploy on my local and tried generating the load to raise the cpu utilization and trigger the hpa.
Deployment.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
namespace: default
name: whoami
labels:
app: whoami
spec:
replicas: 1
selector:
matchLabels:
app: whoami
template:
metadata:
labels:
app: whoami
spec:
containers:
- name: whoami
image: containous/whoami
resources:
requests:
cpu: 30m
limits:
cpu: 40m
ports:
- name: web
containerPort: 80
lifecycle:
preStop:
exec:
command:
- /bin/sh
- echo "Starting Sleep"; date; sleep 600; echo "Pod will be terminated now"
---
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: whoami
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: whoami
minReplicas: 1
maxReplicas: 3
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 40
# - type: Resource
# resource:
# name: memory
# targetAverageUtilization: 10
---
apiVersion: v1
kind: Service
metadata:
name: whoami-service
spec:
ports:
- port: 80
targetPort: 80
protocol: TCP
name: http
selector:
app: whoami
Once the pod is deployed, execute the below command which will generate the load.
kubectl run -i --tty load-generator --image=busybox /bin/sh
while true; do wget -q -O- http://whoami-service.default.svc.cluster.local; done
Once the replicas are created, I stopped the load and the pods are terminated after 600 seconds. This scenario worked for me. I believe this would be the similar case for statefulset as well. Hope this helps.

No way to apply change to Deployment binded with ReadWriteOnce PV in gCloud Kubernetes engine?

As the GCE Disk does not support ReadWriteMany , I have no way to apply change to Deployment but being stucked at ContainerCreating with FailedAttachVolume .
So here's my setting:
1. PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mysql-pv-claim
labels:
app: mysql
spec:
storageClassName: "standard"
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
2. Service
apiVersion: v1
kind: Service
metadata:
name: mysql
labels:
app: mysql
spec:
type: ClusterIP
ports:
- protocol: TCP
port: 3306
targetPort: 3306
selector:
app: mysql
3. Deployment
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
name: mysql
labels:
app: mysql
spec:
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- image: mysql/mysql-server
name: mysql
ports:
- containerPort: 3306
name: mysql
volumeMounts:
- name: mysql-persistent-storage
mountPath: /mysql-data
volumes:
- name: mysql-persistent-storage
persistentVolumeClaim:
claimName: mysql-pv-claim
Which these are all fine for creating the PVC, svc and deployment. Pod and container started successfully and worked as expected.
However when I tried to apply change by:
kubectl apply -f mysql_deployment.yaml
Firstly, the pods were sutcked with the existing one did not terminate and the new one would be creating forever.
NAME READY STATUS RESTARTS AGE
mysql-nowhash 1/1 Running 0 2d
mysql-newhash 0/2 ContainerCreating 0 15m
Secondly from the gCloud console, inside the pod that was trying to create, I got two crucial error logs:
1 of 2 FailedAttachVolume
Multi-Attach error for volume "pvc-<hash>" Volume is already exclusively attached to one node and can't be attached to another FailedAttachVolume
2 of 2 FailedMount
Unable to mount volumes for pod "<pod name and hash>": timeout expired waiting for volumes to attach/mount for pod "default"/"<pod name and hash>". list of unattached/unmounted volumes=[mysql-persistent-storage]
What I could immediately think of is the ReadWriteOnce capability of gCloud PV. Coz the kubernetes engine would create a new pod before terminating the existing one. So under ReadWriteOnce it can never create a new pod and claim the existing pvc...
Any idea or should I use some other way to perform deployment updates?
appreciate for any contribution and suggestion =)
remark: my current work-around is to create an interim NFS pod to make it like a ReadWriteMany pvc, this worked but sounds stupid... requiring an additional storage i/o overhead to facilitate deployment update ?.. =P
The reason is that if you are applying UpdateStrategy: RollingUpdate (as it is default) k8s waits for the new Container to become ready before shutting down the old one. You can change this behaviour by applying UpdateStrategy: Recreate
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy