How do I get Kubernetes Physical Volumes to deploy in proper zone? - kubernetes

I'm running a kubernetes 1.6.2 cluster across three nodes in different zones in GKE and I'm trying to deploy my statefulset where each pod in the statefulset gets a PV attached to it. The problem is that kubernetes is creating the PVs in the one zone where I don't have a node!
$ kubectl describe node gke-multi-consul-default-pool-747c9378-zls3|grep 'zone=us-central1'
failure-domain.beta.kubernetes.io/zone=us-central1-a
$ kubectl describe node gke-multi-consul-default-pool-7e987593-qjtt|grep 'zone=us-central1'
failure-domain.beta.kubernetes.io/zone=us-central1-f
$ kubectl describe node gke-multi-consul-default-pool-8e9199ea-91pj|grep 'zone=us-central1'
failure-domain.beta.kubernetes.io/zone=us-central1-c
$ kubectl describe pv pvc-3f668058-2c2a-11e7-a7cd-42010a8001e2|grep 'zone=us-central1'
failure-domain.beta.kubernetes.io/zone=us-central1-b
I'm using the standard storageclass which has no default zone set:
$ kubectl describe storageclass standard
Name: standard
IsDefaultClass: Yes
Annotations: storageclass.beta.kubernetes.io/is-default-class=true
Provisioner: kubernetes.io/gce-pd
Parameters: type=pd-standard
Events: <none>
So I thought that the scheduler would automatically provision the volumes in a zone where a cluster node existed, but it doesn't seem to be doing that.
For reference, here is the yaml for my statefulset:
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: "{{ template "fullname" . }}"
labels:
heritage: {{.Release.Service | quote }}
release: {{.Release.Name | quote }}
chart: "{{.Chart.Name}}-{{.Chart.Version}}"
component: "{{.Release.Name}}-{{.Values.Component}}"
spec:
serviceName: "{{ template "fullname" . }}"
replicas: {{default 3 .Values.Replicas}}
template:
metadata:
name: "{{ template "fullname" . }}"
labels:
heritage: {{.Release.Service | quote }}
release: {{.Release.Name | quote }}
chart: "{{.Chart.Name}}-{{.Chart.Version}}"
component: "{{.Release.Name}}-{{.Values.Component}}"
app: "consul"
annotations:
pod.alpha.kubernetes.io/initialized: "true"
spec:
securityContext:
fsGroup: 1000
containers:
- name: "{{ template "fullname" . }}"
image: "{{.Values.Image}}:{{.Values.ImageTag}}"
imagePullPolicy: "{{.Values.ImagePullPolicy}}"
ports:
- name: http
containerPort: {{.Values.HttpPort}}
- name: rpc
containerPort: {{.Values.RpcPort}}
- name: serflan-tcp
protocol: "TCP"
containerPort: {{.Values.SerflanPort}}
- name: serflan-udp
protocol: "UDP"
containerPort: {{.Values.SerflanUdpPort}}
- name: serfwan-tcp
protocol: "TCP"
containerPort: {{.Values.SerfwanPort}}
- name: serfwan-udp
protocol: "UDP"
containerPort: {{.Values.SerfwanUdpPort}}
- name: server
containerPort: {{.Values.ServerPort}}
- name: consuldns
containerPort: {{.Values.ConsulDnsPort}}
resources:
requests:
cpu: "{{.Values.Cpu}}"
memory: "{{.Values.Memory}}"
env:
- name: INITIAL_CLUSTER_SIZE
value: {{ default 3 .Values.Replicas | quote }}
- name: STATEFULSET_NAME
value: "{{ template "fullname" . }}"
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: STATEFULSET_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: datadir
mountPath: /var/lib/consul
- name: gossip-key
mountPath: /etc/secrets
readOnly: true
- name: config
mountPath: /etc/consul
- name: tls
mountPath: /etc/tls
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- consul leave
livenessProbe:
exec:
command:
- consul
- members
initialDelaySeconds: 300
timeoutSeconds: 5
command:
- "/bin/sh"
- "-ec"
- "/tmp/consul-start.sh"
volumes:
- name: config
configMap:
name: consul
- name: gossip-key
secret:
secretName: {{ template "fullname" . }}-gossip-key
- name: tls
secret:
secretName: consul
volumeClaimTemplates:
- metadata:
name: datadir
annotations:
{{- if .Values.StorageClass }}
volume.beta.kubernetes.io/storage-class: {{.Values.StorageClass | quote}}
{{- else }}
volume.alpha.kubernetes.io/storage-class: default
{{- end }}
spec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
# upstream recommended max is 700M
storage: "{{.Values.Storage}}"

There is a bug open for this issue here.
The workaround in the meantime is to set the zones parameter in your StorageClass to specify the exact zones where your Kubernetes cluster has nodes. Here is an example.

Answer from the Kubernetes documentation about Persistent Volumes: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#gce
zone: GCE zone. If not specified, a random zone in the same region as controller-manager will be chosen.
I guess your controller manager is in region us-central-1, so any zone can be choosen from that region, in your case I guess the only zone that is not covered is us-central-1b, so you have to start a node there as well, or set the zone in the StorageClass resource.

You could create storage classes for each zone, then a PV/PVC may specify that storage class. Your stateful sets/deployments could be set up to target a specific node via nodeSelector so they always get scheduled on a node in a specific zone (see built-in node labels)
storage_class.yml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: us-central-1a
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-standard
zone: us-central1-a
persistent_volume.yml
apiVersion: v1
kind: PersistentVolume
metadata:
name: some-volume
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
storageClassName: us-central-1a
Note that you can use storageClassName in kubernetes 1.6, or otherwise the annotation volume.beta.kubernetes.io/storage-class should work too (however will deprecate in the future).

Related

Access kubernetes job pods using hostname

I am trying to run a k8s job with 2 pods in which one pod will try to connect to other pod.
I cannot connect to other pod using hostname of the pod as suggested in the doc - https://kubernetes.io/docs/concepts/workloads/controllers/job/#completion-mode.
I have created a service and trying to access the pod as k8s-train-0.default.svc.cluster.local as mentioned in the document.
apiVersion: batch/v1
kind: Job
metadata:
name: k8s-train
spec:
parallelism: 2
completions: 2
completionMode: Indexed
manualSelector: true
selector:
matchLabels:
app.kubernetes.io/name: proxy
template:
metadata:
labels:
app.kubernetes.io/name: proxy
spec:
containers:
- name: k8s-train
image: pytorch/pytorch:1.11.0-cuda11.3-cudnn8-runtime
command: ["/bin/sh","-c"]
args:
- echo starting;
export MASTER_PORT=54321;
export MASTER_ADDR=k8s-train-0.trainsvc.default.svc.cluster.local;
export WORLD_SIZE=8;
pip install -r /data/requirements.txt;
export NCCL_DEBUG=INFO;
python /data/bert.py --strategy=ddp --num_nodes=2 --gpus=4 --max_epochs=3;
echo done;
env:
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
ports:
- containerPort: 54321
name: master-port
resources:
requests:
nvidia.com/gpu: 4
limits:
nvidia.com/gpu: 4
volumeMounts:
- mountPath: /data
name: data
volumes:
- name: data
persistentVolumeClaim:
claimName: efs-claim
restartPolicy: Never
backoffLimit: 0
---
apiVersion: v1
kind: Service
metadata:
name: trainsvc
spec:
selector:
app.kubernetes.io/name: proxy
ports:
- name: master-svc-port
protocol: TCP
port: 54321
targetPort: master-port
clusterIP: None
https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
I am looking to establish communication between pod either using the hostname or to assign svc only to one pod slected with job-index.
Please let me know if i'm missing something here.
Thanks.

Helm/Kubernetes - StatefulSet: is that possible use serviceAccount from different namespace

Since the problem mentioned here.
I am wondering if it is possible to refer serviceAccountName: "test-sa" which is in namespace n2 for example to create statefulset in namespace n1
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: "{{.Values.ContainerName}}"
namespace: n1
labels:
name: "{{.Values.ReplicaName}}"
app: "{{.Values.ContainerName}}"
chart: "{{.Chart.Name}}-{{.Chart.Version}}"
annotations:
"helm.sh/created": {{.Release.Time.Seconds | quote }}
spec:
selector:
matchLabels:
app: "{{.Values.ContainerName}}"
serviceName: "{{.Values.ContainerName}}"
replicas: 2
template:
metadata:
labels:
app: "{{.Values.ContainerName}}"
spec:
serviceAccountName: "test-sa"
securityContext:
fsGroup: 26
terminationGracePeriodSeconds: 10
containers:
- name: {{.Values.ContainerName}}
image: "{{.Values.PostgresImage}}"
ports:
- containerPort: 5432
name: postgres
resources:
requests:
cpu: {{default "100m" .Values.Cpu}}
memory: {{default "100M" .Values.Memory}}
env:
- name: PGHOST
value: /tmp
- name: PG_PRIMARY_USER
value: primaryuser
- name: PG_MODE
value: set
- name: PG_PRIMARY_HOST
value: "{{.Values.PrimaryName}}"
- name: PG_PRIMARY_PORT
value: "5432"
- name: PG_PRIMARY_PASSWORD
value: "{{.Values.PrimaryPassword}}"
- name: PG_USER
value: testuser
- name: PG_PASSWORD
value: "{{.Values.UserPassword}}"
- name: PG_DATABASE
value: userdb
- name: PG_ROOT_PASSWORD
value: "{{.Values.RootPassword}}"
volumeMounts:
- name: pgdata
mountPath: "/pgdata"
readOnly: false
volumes:
- name: pgdata
persistentVolumeClaim:
claimName: {{.Values.PVCName}}
You can't; they need to be in the same namespace.
This is a more general rule. Whenever one object refers to another they generally need to be in the same namespace, or the target needs to be a cluster-global object. If a Pod references data in a ConfigMap or mounts a PersistentVolumeClaim, those need to be in the same namespace; if a Service selects Pods by label, those need to be in the same namespace. There are a couple of exceptions, notably around RBAC, but usually these things need to be deployed together.
In the context of a Helm chart, I'd just create a new ServiceAccount rather than trying to reuse an existing one. If it uses the typical {{ .Release.Name }}-{{ .Chart.Name }} naming pattern there won't generally be naming conflicts.

kubernetes StorageClass does not retain existing data

My Kubernetes StorageClass volume doesn't retain existing data when the pod is deleted and deployed back with my postgresql database. When I delete the pod, the new pod is created but the database is empty.
I have followed variations of the different versions of the tutorials (https://kubernetes.io/docs/concepts/storage/persistent-volumes/) but nothing seems to work.
I paste all the YAML files cause the problem might be in the combination.
storage-google.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: spingular-pvc
spec:
storageClassName: standard
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 7Gi
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: standard
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-standard
zone: us-east4-a
jhipsterpress-postgresql.yml
apiVersion: v1
kind: Secret
metadata:
name: jhipsterpress-postgresql
namespace: default
labels:
app: jhipsterpress-postgresql
type: Opaque
data:
postgres-password: NjY0NXJxd24=
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: jhipsterpress-postgresql
namespace: default
spec:
replicas: 1
template:
metadata:
labels:
app: jhipsterpress-postgresql
spec:
volumes:
- name: data
persistentVolumeClaim:
claimName: spingular-pvc
containers:
- name: postgres
image: postgres:10.4
env:
- name: POSTGRES_USER
value: jhipsterpress
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: jhipsterpress-postgresql
key: postgres-password
ports:
- containerPort: 5432
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/
---
apiVersion: v1
kind: Service
metadata:
name: jhipsterpress-postgresql
namespace: default
spec:
selector:
app: jhipsterpress-postgresql
ports:
- name: postgresqlport
port: 5432
type: LoadBalancer
jhipsterpress-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: jhipsterpress
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: jhipsterpress
version: "v1"
template:
metadata:
labels:
app: jhipsterpress
version: "v1"
spec:
initContainers:
- name: init-ds
image: busybox:latest
command:
- '/bin/sh'
- '-c'
- |
while true
do
rt=$(nc -z -w 1 jhipsterpress-postgresql 5432)
if [ $? -eq 0 ]; then
echo "DB is UP"
break
fi
echo "DB is not yet reachable;sleep for 10s before retry"
sleep 10
done
containers:
- name: jhipsterpress-app
image: galore/jhipsterpress
env:
- name: SPRING_PROFILES_ACTIVE
value: prod
- name: SPRING_DATASOURCE_URL
value: jdbc:postgresql://jhipsterpress-postgresql.default.svc.cluster.local:5432/jhipsterpress
- name: SPRING_DATASOURCE_USERNAME
value: jhipsterpress
- name: SPRING_DATASOURCE_PASSWORD
valueFrom:
secretKeyRef:
name: jhipsterpress-postgresql
key: postgres-password
- name: JAVA_OPTS
value: " -Xmx256m -Xms256m"
resources:
requests:
memory: "256Mi"
cpu: "500m"
limits:
memory: "512Mi"
cpu: "1"
ports:
- name: http
containerPort: 8080
readinessProbe:
httpGet:
path: /management/health
port: http
initialDelaySeconds: 20
periodSeconds: 15
failureThreshold: 6
livenessProbe:
httpGet:
path: /management/health
port: http
initialDelaySeconds: 120
jhipsterpress-service.yml
apiVersion: v1
kind: Service
metadata:
name: jhipsterpress
namespace: default
labels:
app: jhipsterpress
spec:
selector:
app: jhipsterpress
type: LoadBalancer
ports:
- name: http
port: 8080
When I included a Retain Policy I was getting this error:
#cloudshell:~ (academic-veld-230622)$ kubectl apply -f storage-google.yaml
error: error validating "storage-google.yaml": error validating data:
ValidationError(PersistentVolumeClaim.spec): unknown field "persistentVolumeReclaimPolicy" in io.k8s.api.core.v1.PersistentVolumeClaimSpec; if you choose to ignore these errors, turn validation off with --validate=false
Please, if you know of a complete example on a public image that works (in postgresql, I can make it work with Mongo), I will really appreciate it.
Thanks to all.
Note that for this to work you need to have your PVC dynamically provision a PV to satisfy its requirements, then there will be a permanent binding between the PVC and PV and every time your workload uses the PVC then it will use the same PV. Specifically indicated by this excerpt:
If a PV was dynamically provisioned for a new PVC, the loop will always bind that PV to the PVC
If in your case the Google Persistent Disk is being provisioned by the PVC, and you can verify that on GCP it's the same PV used every time, then it's probably an issue with the pod startup process where it's removing all the data. (Is there any reason why you are using /var/lib/postgresql/ vs /var/lib/postgresql?)
Also, persistentVolumeReclaimPolicy: Retain applies to a PV, not a PVC. For dynamically provisioned PVs the value is Delete. In your case, it wouldn't apply because your dynamically provisioned volume should be bound to your PVC. In other words, you are not reclaiming the volume.
Having said all that the recommended way to deploy a DB is using StatefulSets similar to this mysql example using a volumeClaimTemplate.

SchedulerPredicates failed due to PersistentVolumeClaim is not bound

I am using helm with kubernetes on google cloud platform.
i get the following error for my postgres deployment:
SchedulerPredicates failed due to PersistentVolumeClaim is not bound
it looks like it cant connect to the persistent storage but i don't understand why because the persistent storage loaded fine.
i have tried deleting the helm release completely, then on google-cloud-console > compute-engine > disks; i have deleted all persistent disk. and finally tried to install from the helm chart, but the postgres deployment still doesnt connect to the PVC.
my database configuration:
{{- $serviceName := "db-service" -}}
{{- $deploymentName := "db-deployment" -}}
{{- $pvcName := "db-disk-claim" -}}
{{- $pvName := "db-disk" -}}
apiVersion: v1
kind: Service
metadata:
name: {{ $serviceName }}
labels:
name: {{ $serviceName }}
env: production
spec:
type: LoadBalancer
ports:
- port: 5432
targetPort: 5432
protocol: TCP
name: http
selector:
name: {{ $deploymentName }}
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: {{ $deploymentName }}
labels:
name: {{ $deploymentName }}
env: production
spec:
replicas: 1
template:
metadata:
labels:
name: {{ $deploymentName }}
env: production
spec:
containers:
- name: postgres-database
image: postgres:alpine
imagePullPolicy: Always
env:
- name: POSTGRES_USER
value: test-user
- name: POSTGRES_PASSWORD
value: test-password
- name: POSTGRES_DB
value: test_db
- name: PGDATA
value: /var/lib/postgresql/data/pgdata
ports:
- containerPort: 5432
volumeMounts:
- mountPath: "/var/lib/postgresql/data/pgdata"
name: {{ $pvcName }}
volumes:
- name: {{ $pvcName }}
persistentVolumeClaim:
claimName: {{ $pvcName }}
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: {{ $pvcName }}
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
selector:
matchLabels:
name: {{ $pvName }}
env: production
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: {{ .Values.gcePersistentDisk }}
labels:
name: {{ $pvName }}
env: production
annotations:
volume.beta.kubernetes.io/mount-options: "discard"
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
gcePersistentDisk:
fsType: "ext4"
pdName: {{ .Values.gcePersistentDisk }}
is this config for kubenetes correct? i have read the documentation and it looks like this should work. i'm new to Kubernetes and helm so any advice is appreciated.
EDIT:
i have added a PersistentVolume and linked it to the PersistentVolumeClaim to see if that helps, but it seems that when i do this, the PersistentVolumeClaim status becomes stuck in "pending" (resulting in the same issue as before).
You don't have a bound PV for this claim. What storage you use for this claim. You need to mention it in the PVC file

Host specific volumes in Kubernetes manifests

I am fairly sure this isn't possible, but I wanted to check.
I am using Kubernetes stateful sets, so my hosts get obvious hostnames.
I'd like them to provision a hostPath mount that is mapped to their hostname.
An example helm chart that I'm using might look like this:
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: app
namespace: '{{ .Values.name }}'
labels:
chart: '{{ .Chart.Name }}-{{ .Chart.Version | replace "+" "_" }}'
spec:
serviceName: "app"
replicas: {{ .Values.replicaCount }}
template:
metadata:
labels:
app: app
spec:
terminationGracePeriodSeconds: 30
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.image.repository }}/{{ .Values.image.version}}"
imagePullPolicy: '{{ .Values.image.pullPolicy }}'
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
ports:
- containerPort: {{ .Values.baseport | add 80 }}
name: app
volumeMounts:
- mountPath: /NAS/$(POD_NAME)
name: store
readOnly: true
volumes:
- name: store
hostPath:
path: /NAS/$(POD_NAME)
Essentially, instead of hardcoding a volume, I'd like to have some kind of dynamic variable as the path. I don't mind using helm or the downward API for this, but ideally it would work when I scale the stateful set outwards.
Is there any way of doing this? All my docs reading seems to think it's not... :(