PostgreSQL-HA on Kubernetes recover from Volume Snapshot? - postgresql

I have a Kubernetes Volume Snapshot created for pgsql-ha persistent volume backup.
Now that I'm able to recover the PVC by specifying the dataSource as the volume snapshot, and trying to create a new pgsql-ha cluster using HELM chart, then attach this PCV to recover the data. Following is the example installation command:
helm install db-ha bitnami/postgresql-ha\
--set postgresql.password=$PWD \
--set persistence.existingClaim="pvc-restore-from-snapshot"
Then the pgpol and both postgresql Pods shows CrashLoopBackOff forever.
$ kubectl get pods --watch
NAME READY STATUS RESTARTS AGE
db-ha-pgpool-gradfergr43sfxv 0/1 Running 0 8s
db-ha-postgresql-0 0/1 Init:0/1 0 8s
db-ha-postgresql-1 0/1 Init:0/1 0 8s
db-ha-postgresql-1 0/1 PodInitializing 0 23s
db-ha-postgresql-0 0/1 PodInitializing 0 23s
db-ha-postgresql-1 0/1 Error 0 24s
db-ha-postgresql-0 0/1 Error 0 24s
db-ha-postgresql-1 0/1 Error 1 25s
db-ha-postgresql-0 0/1 Error 1 25s
db-ha-postgresql-1 0/1 CrashLoopBackOff 1 26s
db-ha-postgresql-0 0/1 CrashLoopBackOff 1 27s
From what I have read so far in this issue, persistence.existingClaim is only supported when replica was set to 1, which means it can only be restored on a non-ha cluster, and pgsql-ha is currently unable to replicate the manually specified PVC.
So I'm wondering the following:
If that is the whole story, and there is nothing that I'm missing
If it's possible to modify the storageClass or even the provisioner (ebs-csi), so that the existing PVC can be used
If other workaround exist for this workflow
Many thanks!

Related

Kubernetes CrashLoopBackOff default timing

What are the defaults for the Kubernetes CrashLoopBackOff?
Say, I have a pod:
kubectl run mynginx --image nginx -- echo hello
And I inspect its status:
kubectl get pods -w
NAME READY STATUS RESTARTS AGE
mynginx 0/1 Pending 0 0s
mynginx 0/1 Pending 0 0s
mynginx 0/1 ContainerCreating 0 0s
mynginx 0/1 Completed 0 2s
mynginx 0/1 Completed 1 4s
mynginx 0/1 CrashLoopBackOff 1 5s
mynginx 0/1 Completed 2 20s
mynginx 0/1 CrashLoopBackOff 2 33s
mynginx 0/1 Completed 3 47s
mynginx 0/1 CrashLoopBackOff 3 59s
mynginx 0/1 Completed 4 97s
mynginx 0/1 CrashLoopBackOff 4 109s
This is "expected". Kubernetes starts a pod, it quits "too fast", Kubernetes schedules it again and then Kubernetes sets the state to CrashLoopBackOff.
Now, if i start a pod slightly differently:
kubectl run mynginx3 --image nginx -- /bin/bash -c "sleep 10; echo hello"
I get the following
kubectl get pods -w
NAME READY STATUS RESTARTS AGE
mynginx3 0/1 Pending 0 0s
mynginx3 0/1 Pending 0 0s
mynginx3 0/1 ContainerCreating 0 0s
mynginx3 1/1 Running 0 2s
mynginx3 0/1 Completed 0 12s
mynginx3 1/1 Running 1 14s
mynginx3 0/1 Completed 1 24s
mynginx3 0/1 CrashLoopBackOff 1 36s
mynginx3 1/1 Running 2 38s
mynginx3 0/1 Completed 2 48s
mynginx3 0/1 CrashLoopBackOff 2 62s
mynginx3 1/1 Running 3 75s
mynginx3 0/1 Completed 3 85s
mynginx3 0/1 CrashLoopBackOff 3 96s
mynginx3 1/1 Running 4 2m14s
mynginx3 0/1 Completed 4 2m24s
mynginx3 0/1 CrashLoopBackOff 4 2m38s
This is also expected.
But say I set sleep for 24 hours, would I still get the same CrashLoopBackOff after two pod exits initially and then after each next pod exit?
Based on these docs:
The restartPolicy applies to all containers in the Pod. restartPolicy only refers to restarts of the containers by the kubelet on the same node. After containers in a Pod exit, the kubelet restarts them with an exponential back-off delay (10s, 20s, 40s, …), that is capped at five minutes. Once a container has executed for 10 minutes without any problems, the kubelet resets the restart backoff timer for that container.
I think that means that anything that executes for longer than 10 minutes before exiting will not trigger a CrashLoopBackOff status.

Can't install confluent helm chart on minikube

I'm trying to use confluent helm chart following this link https://github.com/confluentinc/cp-helm-charts/tree/e17565cd5a6985a594155b12b08068cb5882e51f/charts/cp-kafka-connect but when I install it on minikube I got ImagePullBackOff
confluent-oss3-cp-control-center-5fc8c494c8-k25ps 0/1 ImagePullBackOff 0 76m
confluent-oss3-cp-kafka-0 0/2 ImagePullBackOff 0 76m
confluent-oss3-cp-kafka-connect-7849d49c47-jmmrn 0/2 ImagePullBackOff 0 76m
confluent-oss3-cp-kafka-rest-777cc4899b-zqcf9 0/2 ImagePullBackOff 0 76m
confluent-oss3-cp-ksql-server-567646677-b8lw4 0/2 ImagePullBackOff 0 76m
confluent-oss3-cp-schema-registry-6b8d69887d-5cmvt 0/2 ErrImagePull 0 76m
confluent-oss3-cp-zookeeper-0 0/2 ImagePullBackOff 0 76m
Is there any solution to fix this problem ?
I solved the issue after deleting minikube and reinstalling helm and the chart again.

promethues operator alertmanager-main-0 pending and display

What happened?
kubernetes version: 1.12
promethus operator: release-0.1
I follow the README:
$ kubectl create -f manifests/
# It can take a few seconds for the above 'create manifests' command to fully create the following resources, so verify the resources are ready before proceeding.
$ until kubectl get customresourcedefinitions servicemonitors.monitoring.coreos.com ; do date; sleep 1; echo ""; done
$ until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
$ kubectl apply -f manifests/ # This command sometimes may need to be done twice (to workaround a race condition).
and then I use the command and then is showed like:
[root#VM_8_3_centos /data/hansenwu/kube-prometheus/manifests]# kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 66s
alertmanager-main-1 1/2 Running 0 47s
grafana-54f84fdf45-kt2j9 1/1 Running 0 72s
kube-state-metrics-65b8dbf498-h7d8g 4/4 Running 0 57s
node-exporter-7mpjw 2/2 Running 0 72s
node-exporter-crfgv 2/2 Running 0 72s
node-exporter-l7s9g 2/2 Running 0 72s
node-exporter-lqpns 2/2 Running 0 72s
prometheus-adapter-5b6f856dbc-ndfwl 1/1 Running 0 72s
prometheus-k8s-0 3/3 Running 1 59s
prometheus-k8s-1 3/3 Running 1 59s
prometheus-operator-5c64c8969-lqvkb 1/1 Running 0 72s
[root#VM_8_3_centos /data/hansenwu/kube-prometheus/manifests]# kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 0/2 Pending 0 0s
grafana-54f84fdf45-kt2j9 1/1 Running 0 75s
kube-state-metrics-65b8dbf498-h7d8g 4/4 Running 0 60s
node-exporter-7mpjw 2/2 Running 0 75s
node-exporter-crfgv 2/2 Running 0 75s
node-exporter-l7s9g 2/2 Running 0 75s
node-exporter-lqpns 2/2 Running 0 75s
prometheus-adapter-5b6f856dbc-ndfwl 1/1 Running 0 75s
prometheus-k8s-0 3/3 Running 1 62s
prometheus-k8s-1 3/3 Running 1 62s
prometheus-operator-5c64c8969-lqvkb 1/1 Running 0 75s
I don't know why the pod altertmanager-main-0 pending and disaply then restart.
And I see the event, it is showed as:
72s Warning FailedCreate StatefulSet create Pod alertmanager-main-0 in StatefulSet alertmanager-main failed error: The POST operation against Pod could not be completed at this time, please try again.
72s Warning FailedCreate StatefulSet create Pod alertmanager-main-0 in StatefulSet alertmanager-main failed error: The POST operation against Pod could not be completed at this time, please try again.
72s Warning^Z FailedCreate StatefulSet
[10]+ Stopped kubectl get events -n monitoring
Most likely the alertmanager does not get enough time to start correctly.
Have a look at this answer : https://github.com/coreos/prometheus-operator/issues/965#issuecomment-460223268
You can set the paused field to true, and then modify the StatefulSet to try if extending the liveness/readiness solves your issue.

HTTPError 400 while deploying production-ready GitLab on Google Kubernetes Engine

I'm following the official Tutorial for Deploying production-ready GitLab on Google Kubernetes Engine.
The step Create the PostgreSQL instance and database: 1. Create the Cloud SQL database that GitLab will use to store most of its metadata gave me the Error:
gcloud beta sql instances create gitlab-db --network default \
--database-version=POSTGRES_9_6 --cpu 4 --memory 15 --no-assign-ip \
--storage-auto-increase --zone us-central1-a
ERROR: (gcloud.beta.sql.instances.create) HTTPError 400: Invalid request: Project {here_stands_my_correct_Project_ID} has invalid private network
name https://compute.googleapis.com/compute/v1/projects/{here_stands_my_correct_Project_ID}/global/networks/default.
Any ideas, thank you?
EDIT: I used the following command and edited manually the gilab-db to Private IP with attached Network (default) in the Console getting a 503 Error at the end of the the tutorial.
gcloud beta sql instances create gitlab-db --database-version=POSTGRES_9_6 --cpu 4 --memory 15 --storage-auto-increase --zone us-central1-a
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
gitlab-certmanager-788c6859c6-szqqm 1/1 Running 0 28m
gitlab-gitaly-0 0/1 Pending 0 28m
gitlab-gitlab-runner-6cfb858756-l8gxr 0/1 CrashLoopBackOff 6 28m
gitlab-gitlab-shell-6cc87fcd4c-2mqph 1/1 Running 0 28m
gitlab-gitlab-shell-6cc87fcd4c-jvp8n 1/1 Running 0 27m
gitlab-issuer.1-cx8tm 0/1 Completed 0 28m
gitlab-nginx-ingress-controller-5f486c5f7b-md8rj 1/1 Running 0 28m
gitlab-nginx-ingress-controller-5f486c5f7b-rps6m 1/1 Running 0 28m
gitlab-nginx-ingress-controller-5f486c5f7b-xc8fv 1/1 Running 0 28m
gitlab-nginx-ingress-default-backend-7f87d67c8-6xhhz 1/1 Running 0 28m
gitlab-nginx-ingress-default-backend-7f87d67c8-7w2s2 1/1 Running 0 28m
gitlab-registry-8dfc8f979-9hdbr 0/1 Init:0/2 0 28m
gitlab-registry-8dfc8f979-qr5nd 0/1 Init:0/2 0 27m
gitlab-sidekiq-all-in-1-88f47878-26nh8 0/1 Init:CrashLoopBackOff 7 28m
gitlab-task-runner-74fc4ccdb9-pm592 1/1 Running 0 28m
gitlab-unicorn-5b74ffdff8-4kkj4 0/2 Init:CrashLoopBackOff 7 28m
gitlab-unicorn-5b74ffdff8-nz662 0/2 Init:CrashLoopBackOff 7 27m
kube-state-metrics-57b88466db-h7xkj 1/1 Running 0 27m
node-exporter-q4bpv 1/1 Running 0 27m
node-exporter-x8mtj 1/1 Running 0 27m
node-exporter-xrdlv 1/1 Running 0 27m
prometheus-k8s-5cf4c4cf6c-hsntr 2/2 Running 1 27m
Possibly this is because it's still in beta and not all features and/or options are working correctly.
I can advice that you check if you have just one network available.
You can do that by using gcloud compute networks list.
$ gcloud compute networks list
NAME SUBNET_MODE BGP_ROUTING_MODE IPV4_RANGE GATEWAY_IPV4
default AUTO REGIONAL
If you see only default network then there is no need to worry about providing the --network flag at all.
Also form what it looks like the instance will need to use an IP either Public or Private so you can leave out the flag --no-assign-ip.
Working command might look like this:
gcloud beta sql instances create gitlab-db --database-version=POSTGRES_9_6 --cpu 4 --memory 15 --storage-auto-increase --zone us-central1-a
You can read the docs about the flags and usage on gcloud beta sql instances create

Volume is already attached by pod

I install kubernetes on ubuntu on baremetal. I deploy 1 master and 3 worker.
and then deploy rook and every thing work fine.but when i want to deploy a wordpress on it ,it stuck in container creating and then i delete wordpress and now i got this error
Volume is already attached by pod
default/wordpress-mysql-b78774f44-gvr58. Status Running
#kubectl describe pods wordpress-mysql-b78774f44-bjc2c
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m21s default-scheduler Successfully assigned default/wordpress-mysql-b78774f44-bjc2c to worker2
Warning FailedMount 2m57s (x6 over 3m16s) kubelet, worker2 MountVolume.SetUp failed for volume "pvc-dcba7817-553b-11e9-a229-52540076d16c" : mount command failed, status: Failure, reason: Rook: Mount volume failed: failed to attach volume pvc-dcba7817-553b-11e9-a229-52540076d16c for pod default/wordpress-mysql-b78774f44-bjc2c. Volume is already attached by pod default/wordpress-mysql-b78774f44-gvr58. Status Running
Normal Pulling 2m26s kubelet, worker2 Pulling image "mysql:5.6"
Normal Pulled 110s kubelet, worker2 Successfully pulled image "mysql:5.6"
Normal Created 106s kubelet, worker2 Created container mysql
Normal Started 101s kubelet, worker2 Started container mysql
for more information
# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-dcba7817-553b-11e9-a229-52540076d16c 20Gi RWO Delete Bound default/mysql-pv-claim rook-ceph-block 13m
pvc-e9797517-553b-11e9-a229-52540076d16c 20Gi RWO Delete Bound default/wp-pv-claim rook-ceph-block 13m
#kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
mysql-pv-claim Bound pvc-dcba7817-553b-11e9-a229-52540076d16c 20Gi RWO rook-ceph-block 15m
wp-pv-claim Bound pvc-e9797517-553b-11e9-a229-52540076d16c 20Gi RWO rook-ceph-block 14m
#kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default wordpress-595685cc49-sdbfk 1/1 Running 6 9m58s
default wordpress-mysql-b78774f44-bjc2c 1/1 Running 0 8m14s
kube-system coredns-fb8b8dccf-plnt4 1/1 Running 0 46m
kube-system coredns-fb8b8dccf-xrkql 1/1 Running 0 47m
kube-system etcd-master 1/1 Running 0 46m
kube-system kube-apiserver-master 1/1 Running 0 46m
kube-system kube-controller-manager-master 1/1 Running 1 46m
kube-system kube-flannel-ds-amd64-45bsf 1/1 Running 0 40m
kube-system kube-flannel-ds-amd64-5nxfz 1/1 Running 0 40m
kube-system kube-flannel-ds-amd64-pnln9 1/1 Running 0 40m
kube-system kube-flannel-ds-amd64-sg4pv 1/1 Running 0 40m
kube-system kube-proxy-2xsrn 1/1 Running 0 47m
kube-system kube-proxy-mll8b 1/1 Running 0 42m
kube-system kube-proxy-mv5dw 1/1 Running 0 42m
kube-system kube-proxy-v2jww 1/1 Running 0 42m
kube-system kube-scheduler-master 1/1 Running 0 46m
rook-ceph-system rook-ceph-agent-8pbtv 1/1 Running 0 26m
rook-ceph-system rook-ceph-agent-hsn27 1/1 Running 0 26m
rook-ceph-system rook-ceph-agent-qjqqx 1/1 Running 0 26m
rook-ceph-system rook-ceph-operator-d97564799-9szvr 1/1 Running 0 27m
rook-ceph-system rook-discover-26g84 1/1 Running 0 26m
rook-ceph-system rook-discover-hf7lc 1/1 Running 0 26m
rook-ceph-system rook-discover-jc72g 1/1 Running 0 26m
rook-ceph rook-ceph-mgr-a-68cb58b456-9rrj7 1/1 Running 0 21m
rook-ceph rook-ceph-mon-a-6469b4c68f-cq6mj 1/1 Running 0 23m
rook-ceph rook-ceph-mon-b-d59cfd758-2d2zt 1/1 Running 0 22m
rook-ceph rook-ceph-mon-c-79664b789-wl4t4 1/1 Running 0 21m
rook-ceph rook-ceph-osd-0-8778dbbc-d84mh 1/1 Running 0 19m
rook-ceph rook-ceph-osd-1-84974b86f6-z5c6c 1/1 Running 0 19m
rook-ceph rook-ceph-osd-2-84f9b78587-czx6d 1/1 Running 0 19m
rook-ceph rook-ceph-osd-prepare-worker1-x4rqc 0/2 Completed 0 20m
rook-ceph rook-ceph-osd-prepare-worker2-29jpg 0/2 Completed 0 20m
rook-ceph rook-ceph-osd-prepare-worker3-rkp52 0/2 Completed 0 20m
You are using a standard class storage for your PVC, and your policy will be ReadWriteOnce. This does not mean you can only connect your PVC to one pod, but only to one node.
ReadWriteOnce – the volume can be mounted as read-write by a single node
ReadWriteMany – the volume can be mounted as read-write by many nodes
Here, seems like you have 2 pods trying to mount this volume. This will be flakey unless you do one of two things:
Schedule both pods on the same node
Use other storageClasses such as NFS (FileSystem) to change policy to ReadWriteMany
Downscale to 1 pod, so you don't have to share the volume
Right now you have 2 pods trying to mount the same volume, default/wordpress-mysql-b78774f44-gvr58 and default/wordpress-mysql-b78774f44-bjc2c.
You can also downscale to 1 pod, so you don't have to worry about any of the above altogether:
kubectl scale deploy wordpress-mysql --replicas=1