I am pretty new in Kubernetese, so apologies if this my questions seem vague. I try to elaborate as much as possible. I have a pod on Google Cloud via Kubernetese that has a GPU in it. This GPU is responsible for processing one set of tasks, let's say classifying images. In order to do so, I created a service with kubernetes. The service section of my yaml file looks something the following. Also the url for this service will be http://model-server-service.default.svc.cluster.local since the name of the Service is moderl-server-service
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: model-server
name: model-server
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: model-server
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
labels:
app: model-server
spec:
containers:
- args:
- -t
- "120"
- -b
- "0.0.0.0"
- app:flask_app
command:
- gunicorn
env:
- name: ENV
value: staging
- name: GCP
value: "2"
image: gcr.io/my-production/my-model-server: myGitHash
imagePullPolicy: Always
name: model-server
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
resources:
limits:
nvidia.com/gpu: 1
ports:
- containerPort: 8000
protocol: TCP
volumeMounts:
- name: model-files
mountPath: /model-server/models
# These containers are run during pod initialization
initContainers:
- name: model-download
image: gcr.io/my-production/my-model-server: myGitHash
command:
- gsutil
- cp
- -r
- gs://my-staging-models/*
- /model-files/
volumeMounts:
- name: model-files
mountPath: "/model-files"
volumes:
- name: model-files
emptyDir: {}
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
runAsUser: 0
terminationGracePeriodSeconds: 30
---
apiVersion: v1
kind: Service
metadata:
labels:
app: model-server
name: model-server-service
namespace: default
spec:
ports:
- port: 80
protocol: TCP
targetPort: 8000
selector:
app: model-server
sessionAffinity: None
type: ClusterIP
Here my question begins. I am creating a new set of tasks. For this new set of tasks, I will need extensive memory, so I do not want to use the previous service. I would like to do it as part of a separate new service. Something with the following url http://model-server-heavy-service.default.svc.cluster.local. I tried to create a new yaml file model-server-heavy.yaml. In this new yaml file, I changed the name of the service from model-server-service into model-server-heavy-service. Also, I changed the name of the app and name from model-server into model-sever-heavy. So the final yaml file looks like something like what I put at the end of this post. Unfortunately, the new model sever does not work and I get the following message for the new model server on kubernetes.
model-server-asdhjs-asd 1/1 Running 0 21m
model-server-heavy-xnshk 0/1 **CrashLoopBackOff** 8 21m
Can someone please shed some light on what I am doing wrong and what would be the alternative for what I have in mind? Why do I get the message CrashLoopBackOff for the second model server? What is it that I am not doing correctly for the second model server.
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: model-server-heavy
name: model-server-heavy
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: model-server-heavy
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
labels:
app: model-server-heavy
spec:
containers:
- args:
- -t
- "120"
- -b
- "0.0.0.0"
- app:flask_app
command:
- gunicorn
env:
- name: ENV
value: staging
- name: GCP
value: "2"
image: gcr.io/my-production/my-model-server:mgGitHash
imagePullPolicy: Always
name: model-server-heavy
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
resources:
limits:
nvidia.com/gpu: 1
ports:
- containerPort: 8000
protocol: TCP
volumeMounts:
- name: model-files
mountPath: /model-server-heavy/models
# These containers are run during pod initialization
initContainers:
- name: model-download
image: gcr.io/my-production/my-model-server:myGitHash
command:
- gsutil
- cp
- -r
- gs://my-staging-models/*
- /model-files/
volumeMounts:
- name: model-files
mountPath: "/model-files"
volumes:
- name: model-files
emptyDir: {}
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
runAsUser: 0
terminationGracePeriodSeconds: 30
---
apiVersion: v1
kind: Service
metadata:
labels:
app: model-server-heavy
name: model-server-heavy-service
namespace: default
spec:
ports:
- port: 80
protocol: TCP
targetPort: 8000
selector:
app: model-server-heavy
sessionAffinity: None
type: ClusterIP
Thanks to #dawid-kruk and #patrick-w I had to make two modification in the model-sever-heavy.yaml in order for it to work.
Change the mountPath from /model-server-heavy/models into /model-server/models
In line 38 of the model-sever-heavy.yaml file, I should have changed the name from model-server-heavy into model-sever.
I first tried to fix the problem by applying the item 1 but it didn't work out. Then I tried the 2nd item as well and it fixed. I need to have both 1 and 2 in place in order for the server to work. I understand why I had to make change for the first item but not sure about the second one.
Related
I am trying to run a k8s job with 2 pods in which one pod will try to connect to other pod.
I cannot connect to other pod using hostname of the pod as suggested in the doc - https://kubernetes.io/docs/concepts/workloads/controllers/job/#completion-mode.
I have created a service and trying to access the pod as k8s-train-0.default.svc.cluster.local as mentioned in the document.
apiVersion: batch/v1
kind: Job
metadata:
name: k8s-train
spec:
parallelism: 2
completions: 2
completionMode: Indexed
manualSelector: true
selector:
matchLabels:
app.kubernetes.io/name: proxy
template:
metadata:
labels:
app.kubernetes.io/name: proxy
spec:
containers:
- name: k8s-train
image: pytorch/pytorch:1.11.0-cuda11.3-cudnn8-runtime
command: ["/bin/sh","-c"]
args:
- echo starting;
export MASTER_PORT=54321;
export MASTER_ADDR=k8s-train-0.trainsvc.default.svc.cluster.local;
export WORLD_SIZE=8;
pip install -r /data/requirements.txt;
export NCCL_DEBUG=INFO;
python /data/bert.py --strategy=ddp --num_nodes=2 --gpus=4 --max_epochs=3;
echo done;
env:
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
ports:
- containerPort: 54321
name: master-port
resources:
requests:
nvidia.com/gpu: 4
limits:
nvidia.com/gpu: 4
volumeMounts:
- mountPath: /data
name: data
volumes:
- name: data
persistentVolumeClaim:
claimName: efs-claim
restartPolicy: Never
backoffLimit: 0
---
apiVersion: v1
kind: Service
metadata:
name: trainsvc
spec:
selector:
app.kubernetes.io/name: proxy
ports:
- name: master-svc-port
protocol: TCP
port: 54321
targetPort: master-port
clusterIP: None
https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
I am looking to establish communication between pod either using the hostname or to assign svc only to one pod slected with job-index.
Please let me know if i'm missing something here.
Thanks.
I have redis DB setup running on my minikube cluster. I have shutdown my minikube and started after 3 days and I can see my redis pod is failing to come up with below error from pod log
Bad file format reading the append only file: make a backup of your AOF file, then use ./redis-check-aof --fix <filename>.
Below is my Stateful Set yaml file for redis master deployed via a helm chart
apiVersion: apps/v1
kind: StatefulSet
metadata:
annotations:
meta.helm.sh/release-name: test-redis
meta.helm.sh/release-namespace: test
generation: 1
labels:
app.kubernetes.io/component: master
app.kubernetes.io/instance: test-redis
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: redis
helm.sh/chart: redis-14.8.11
name: test-redis-master
namespace: test
resourceVersion: "191902"
uid: 3a4e541f-154f-4c54-a379-63974d90089e
spec:
podManagementPolicy: OrderedReady
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/component: master
app.kubernetes.io/instance: test-redis
app.kubernetes.io/name: redis
serviceName: test-redis-headless
template:
metadata:
annotations:
checksum/configmap: dd1f90e0231e5f9ebd1f3f687d534d9ec53df571cba9c23274b749c01e5bc2bb
checksum/health: xxxxx
creationTimestamp: null
labels:
app.kubernetes.io/component: master
app.kubernetes.io/instance: test-redis
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: redis
helm.sh/chart: redis-14.8.11
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/component: master
app.kubernetes.io/instance: test-redis
app.kubernetes.io/name: redis
namespaces:
- tyk
topologyKey: kubernetes.io/hostname
weight: 1
containers:
- args:
- -c
- /opt/bitnami/scripts/start-scripts/start-master.sh
command:
- /bin/bash
env:
- name: BITNAMI_DEBUG
value: "false"
- name: REDIS_REPLICATION_MODE
value: master
- name: ALLOW_EMPTY_PASSWORD
value: "no"
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
key: redis-password
name: test-redis
- name: REDIS_TLS_ENABLED
value: "no"
- name: REDIS_PORT
value: "6379"
image: docker.io/bitnami/redis:6.2.5-debian-10-r11
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- sh
- -c
- /health/ping_liveness_local.sh 5
failureThreshold: 5
initialDelaySeconds: 20
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 6
name: redis
ports:
- containerPort: 6379
name: redis
protocol: TCP
readinessProbe:
exec:
command:
- sh
- -c
- /health/ping_readiness_local.sh 1
failureThreshold: 5
initialDelaySeconds: 20
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 2
resources: {}
securityContext:
runAsUser: 1001
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /opt/bitnami/scripts/start-scripts
name: start-scripts
- mountPath: /health
name: health
- mountPath: /data
name: redis-data
- mountPath: /opt/bitnami/redis/mounted-etc
name: config
- mountPath: /opt/bitnami/redis/etc/
name: redis-tmp-conf
- mountPath: /tmp
name: tmp
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 1001
serviceAccount: test-redis
serviceAccountName: test-redis
terminationGracePeriodSeconds: 30
volumes:
- configMap:
defaultMode: 493
name: test-redis-scripts
name: start-scripts
- configMap:
defaultMode: 493
name: test-redis-health
name: health
- configMap:
defaultMode: 420
name: test-redis-configuration
name: config
- emptyDir: {}
name: redis-tmp-conf
- emptyDir: {}
name: tmp
updateStrategy:
rollingUpdate:
partition: 0
type: RollingUpdate
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/component: master
app.kubernetes.io/instance: test-redis
app.kubernetes.io/name: redis
name: redis-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 8Gi
volumeMode: Filesystem
status:
phase: Pending
Please let me know your suggestions on how can I fix this.
I am not an Redis expert but from what I can see:
kubectl describe pod red3-redis-master-0
...
Bad file format reading the append only file: make a backup of your AOF file, then use ./redis-check-aof --fix <filename>
...
Means that your appendonly.aof file was corrupted with invalid byte sequences in the middle.
How we can proceed if redis-master is not working?:
Verify pvc attached to the redis-master-pod:
kubectl get pvc
NAME STATUS VOLUME
redis-data-red3-redis-master-0 Bound pvc-cf59a0b2-a3ee-4f7f-9f07-8f4922518359
Create new redis-client pod wit the same pvc redis-data-red3-redis-master-0:
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: redis-client
spec:
volumes:
- name: data
persistentVolumeClaim:
claimName: redis-data-red3-redis-master-0
containers:
- name: redis
image: docker.io/bitnami/redis:6.2.3-debian-10-r0
command: ["/bin/bash"]
args: ["-c", "sleep infinity"]
volumeMounts:
- mountPath: "/tmp"
name: data
EOF
Backup your files:
kubectl cp redis-client:/tmp .
Repair appendonly.aof file:
kubectl exec -it redis-client -- /bin/bash
cd /tmp
# make copy of appendonly.aof file:
cp appendonly.aof appendonly.aofbackup
# verify appendonly.aof file:
redis-check-aof appendonly.aof
...
0x 38: Expected prefix '*', got: '"'
AOF analyzed: size=62, ok_up_to=56, ok_up_to_line=13, diff=6
AOF is not valid. Use the --fix option to try fixing it.
...
# repair appendonly.aof file:
redis-check-aof --fix appendonly.aof
# compare files using diff:
diff appendonly.aof appendonly.aofbackup
Note:
As per docs:
The best thing to do is to run the redis-check-aof utility, initially without the --fix option, then understand the problem, jump at the given offset in the file, and see if it is possible to manually repair the file: the AOF uses the same format of the Redis protocol and is quite simple to fix manually. Otherwise it is possible to let the utility fix the file for us, but in that case all the AOF portion from the invalid part to the end of the file may be discarded, leading to a massive amount of data loss if the corruption happened to be in the initial part of the file.
In addition as described in the comments by #Miffa Young you can verify where your data is stored using k8s.io/minikube-hostpath provisioner:
kubectl get pv
...
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM
pvc-cf59a0b2-a3ee-4f7f-9f07-8f4922518359 8Gi RWO Delete Bound default/redis-data-red3-redis-master-0
...
kubectl describe pv pvc-cf59a0b2-a3ee-4f7f-9f07-8f4922518359
...
Source:
Type: HostPath (bare host directory volume)
Path: /tmp/hostpath-provisioner/default/redis-data-red3-redis-master-0
...
Your redis instance is failing down because your appendonly.aof is malformed and stored permanently under this location.
You can ssh into your vm:
minikube -p redis ssh
cd /tmp/hostpath-provisioner/default/redis-data-red3-redis-master-0
# from there you can backup/repair/remove your files:
Another solution is to install this chart using new name in this case new set of pv,pvc for redis StatefulSets will be created.
I think your redis is not quit Gracefully , so the AOF file is in a bad format What is AOF
you should repair aof file using a initcontainer by command (./redis-check-aof --fix .)
apiVersion: apps/v1
kind: StatefulSet
metadata:
annotations:
meta.helm.sh/release-name: test-redis
meta.helm.sh/release-namespace: test
generation: 1
labels:
app.kubernetes.io/component: master
app.kubernetes.io/instance: test-redis
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: redis
helm.sh/chart: redis-14.8.11
name: test-redis-master
namespace: test
resourceVersion: "191902"
uid: 3a4e541f-154f-4c54-a379-63974d90089e
spec:
podManagementPolicy: OrderedReady
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/component: master
app.kubernetes.io/instance: test-redis
app.kubernetes.io/name: redis
serviceName: test-redis-headless
template:
metadata:
annotations:
checksum/configmap: dd1f90e0231e5f9ebd1f3f687d534d9ec53df571cba9c23274b749c01e5bc2bb
checksum/health: xxxxx
creationTimestamp: null
labels:
app.kubernetes.io/component: master
app.kubernetes.io/instance: test-redis
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: redis
helm.sh/chart: redis-14.8.11
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/component: master
app.kubernetes.io/instance: test-redis
app.kubernetes.io/name: redis
namespaces:
- tyk
topologyKey: kubernetes.io/hostname
weight: 1
initContainers:
- name: repair-redis
image: docker.io/bitnami/redis:6.2.5-debian-10-r11
command: ['sh', '-c', "redis-check-aof --fix /data/appendonly.aof"]
containers:
- args:
- -c
- /opt/bitnami/scripts/start-scripts/start-master.sh
command:
- /bin/bash
env:
- name: BITNAMI_DEBUG
value: "false"
- name: REDIS_REPLICATION_MODE
value: master
- name: ALLOW_EMPTY_PASSWORD
value: "no"
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
key: redis-password
name: test-redis
- name: REDIS_TLS_ENABLED
value: "no"
- name: REDIS_PORT
value: "6379"
image: docker.io/bitnami/redis:6.2.5-debian-10-r11
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- sh
- -c
- /health/ping_liveness_local.sh 5
failureThreshold: 5
initialDelaySeconds: 20
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 6
name: redis
ports:
- containerPort: 6379
name: redis
protocol: TCP
readinessProbe:
exec:
command:
- sh
- -c
- /health/ping_readiness_local.sh 1
failureThreshold: 5
initialDelaySeconds: 20
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 2
resources: {}
securityContext:
runAsUser: 1001
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /opt/bitnami/scripts/start-scripts
name: start-scripts
- mountPath: /health
name: health
- mountPath: /data
name: redis-data
- mountPath: /opt/bitnami/redis/mounted-etc
name: config
- mountPath: /opt/bitnami/redis/etc/
name: redis-tmp-conf
- mountPath: /tmp
name: tmp
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 1001
serviceAccount: test-redis
serviceAccountName: test-redis
terminationGracePeriodSeconds: 30
volumes:
- configMap:
defaultMode: 493
name: test-redis-scripts
name: start-scripts
- configMap:
defaultMode: 493
name: test-redis-health
name: health
- configMap:
defaultMode: 420
name: test-redis-configuration
name: config
- emptyDir: {}
name: redis-tmp-conf
- emptyDir: {}
name: tmp
updateStrategy:
rollingUpdate:
partition: 0
type: RollingUpdate
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/component: master
app.kubernetes.io/instance: test-redis
app.kubernetes.io/name: redis
name: redis-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 8Gi
volumeMode: Filesystem
I got that error when deploying a k8s deployment, I tried to impersonate being a root user via the security context but it didn't help, any guess how to solve it? Unfortunately, I don't have any other ideas or a workaround to avoid this permission issue.
The error I get is:
30: line 1: /scripts/wrapper.sh: Permission denied
stream closed
The deployment is as follows:
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler-grok-exporter
labels:
app: cluster-autoscaler-grok-exporter
spec:
replicas: 1
selector:
matchLabels:
app: cluster-autoscaler-grok-exporter
sidecar: cluster-autoscaler-grok-exporter-sidecar
template:
metadata:
labels:
app: cluster-autoscaler-grok-exporter
sidecar: cluster-autoscaler-grok-exporter-sidecar
spec:
securityContext:
runAsUser: 1001
fsGroup: 2000
serviceAccountName: flux
imagePullSecrets:
- name: id-docker
containers:
- name: get-data
# 3.5.0 - helm v3.5.0, kubectl v1.20.2, alpine 3.12
image: dtzar/helm-kubectl:3.5.0
command: ["sh", "-c", "/scripts/wrapper.sh"]
args:
- cluster-autoscaler
- "90"
# - cluster-autoscaler
- "30"
- /scripts/get_data.sh
- /logs/data.log
volumeMounts:
- name: logs
mountPath: /logs/
- name: scripts-volume-get-data
mountPath: /scripts/get_data.sh
subPath: get_data.sh
- name: scripts-wrapper
mountPath: /scripts/wrapper.sh
subPath: wrapper.sh
- name: export-data
image: ippendigital/grok-exporter:1.0.0.RC3
imagePullPolicy: Always
ports:
- containerPort: 9148
protocol: TCP
volumeMounts:
- name: grok-config-volume
mountPath: /grok/config.yml
subPath: config.yml
- name: logs
mountPath: /logs
volumes:
- name: grok-config-volume
configMap:
name: grok-exporter-config
- name: scripts-volume-get-data
configMap:
name: get-data-script
defaultMode: 0777
defaultMode: 0700
- name: scripts-wrapper
configMap:
name: wrapper-config
defaultMode: 0777
defaultMode: 0700
- name: logs
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: cluster-autoscaler-grok-exporter-sidecar
labels:
sidecar: cluster-autoscaler-grok-exporter-sidecar
spec:
type: ClusterIP
ports:
- name: metrics
protocol: TCP
targetPort: 9144
port: 9148
selector:
sidecar: cluster-autoscaler-grok-exporter-sidecar
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app.kubernetes.io/name: cluster-autoscaler-grok-exporter
app.kubernetes.io/part-of: grok-exporter
name: cluster-autoscaler-grok-exporter
spec:
endpoints:
- port: metrics
selector:
matchLabels:
sidecar: cluster-autoscaler-grok-exporter-sidecar
From what I can see, your script does not have execute permissions.
Remove this line from your config map.
defaultMode: 0700
Keep only:
defaultMode: 0777
Also, I see missing leading / in your script path
- /bin/sh scripts/get_data.sh
So, change it to
- /bin/sh /scripts/get_data.sh
I'm learning SQL Server BDC on minkube using this article as a guide. I tried deploying the below yaml file by running the code : kubectl apply -f deployment.yaml
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: mssql-deployment
spec:
replicas: 1
template:
metadata:
labels:
app: mssql
spec:
terminationGracePeriodSeconds: 10
containers:
- name: mssql
image: microsoft/mssql-server-linux
ports:
- containerPort: 1433
securityContext:
privileged: true
env:
- name: ACCEPT_EULA
value: "Y"
- name: SA_PASSWORD
valueFrom:
secretKeyRef:
name: mssql
key: SA_PASSWORD
volumeMounts:
- name: mssqldb
mountPath: /var/opt/mssql
volumes:
- name: mssqldb
persistentVolumeClaim:
claimName: pvc0001
It errored due to the v1beta1 APIVersion. I converted this yaml file by running : kubectl convert -f deployment.yaml and got the below script:
apiVersion: apps/v1
kind: Deployment
metadata:
creationTimestamp: null
name: mssql-deployment
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector: null
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: mssql
spec:
containers:
- env:
- name: ACCEPT_EULA
value: "Y"
- name: SA_PASSWORD
valueFrom:
secretKeyRef:
key: SA_PASSWORD
name: mssql
image: microsoft/mssql-server-linux
imagePullPolicy: Always
name: mssql
ports:
- containerPort: 1433
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 10
status: {}
But when I deploy the above script I get:
Error validating "deployment.yaml": error validating data: ValidationError(Deployment.spec): missing required field "selector" in io.k8s.api.apps.v1.DeploymentSpec; if you choose to ignore these errors, turn validation off with --validate=false
It is related to matchlabels/matchexpressions but I'm not able to address it. Can someone point me in the right direction?
You need to add a selector in the spec section of the deployment. It's a mandatory field.The .spec.selector field defines how the Deployment finds which Pods to manage. In this case, you simply select a label that is defined in the Pod template (app: mssql). However, more sophisticated selection rules are possible, as long as the Pod template itself satisfies the rule.
apiVersion: apps/v1
kindapiVersion: apps/v1
kind: Deployment
metadata:
creationTimestamp: null
name: mssql-deployment
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: mssql
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: mssql
spec:
containers:
- env:
- name: ACCEPT_EULA
value: "Y"
- name: SA_PASSWORD
valueFrom:
secretKeyRef:
key: SA_PASSWORD
name: mssql
image: microsoft/mssql-server-linux
imagePullPolicy: Always
name: mssql
ports:
- containerPort: 1433
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 10
status: {}
missing required field "selector" in io.k8s.api.apps.v1.DeploymentSpec
You need a selector to select which pods are configured to deployment spec.
solution:
selector:
matchLabels:
app: mssql
template:
metadata:
labels:
app: mssql
env:
kubernetes provider: gke
kubernetes version: v1.13.12-gke.25
grafana version: 6.6.2 (official image)
grafana deployment manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
name: grafana
labels:
app: grafana
spec:
containers:
- name: grafana
image: grafana/grafana:6.6.2
ports:
- name: grafana
containerPort: 3000
# securityContext:
# runAsUser: 104
# allowPrivilegeEscalation: true
resources:
limits:
memory: "1Gi"
cpu: "500m"
requests:
memory: "500Mi"
cpu: "100m"
volumeMounts:
- mountPath: /var/lib/grafana
name: grafana-storage
volumes:
- name: grafana-storage
persistentVolumeClaim:
claimName: grafana-pvc
Problem
when I deployed this grafana dashboard first time, its working fine. after sometime I restarted the pod to check whether volume mount is working or not. after restarting, I getting below error.
mkdir: can't create directory '/var/lib/grafana/plugins': Permission denied
GF_PATHS_DATA='/var/lib/grafana' is not writable.
You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migration-from-a-previous-version-of-the-docker-container-to-5-1-or-later
what I understand from this error, user could create these files. How can I give this user appropriate permission to start grafana successfully?
I recreated your deployment with appropriate PVC and noticed that grafana pod was failing.
Output of command: $ kubectl get pods -n monitoring
NAME READY STATUS RESTARTS AGE
grafana-6466cd95b5-4g95f 0/1 Error 2 65s
Further investigation pointed the same errors as yours:
mkdir: can't create directory '/var/lib/grafana/plugins': Permission denied
GF_PATHS_DATA='/var/lib/grafana' is not writable.
You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migration-from-a-previous-version-of-the-docker-container-to-5-1-or-later
This error showed on first creation of a pod and the deployment. There was no need to recreate any pods.
What I did to make it work was to edit your deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
name: grafana
labels:
app: grafana
spec:
securityContext:
runAsUser: 472
fsGroup: 472
containers:
- name: grafana
image: grafana/grafana:6.6.2
ports:
- name: grafana
containerPort: 3000
resources:
limits:
memory: "1Gi"
cpu: "500m"
requests:
memory: "500Mi"
cpu: "100m"
volumeMounts:
- mountPath: /var/lib/grafana
name: grafana-storage
volumes:
- name: grafana-storage
persistentVolumeClaim:
claimName: grafana-pvc
Please take a specific look on part:
securityContext:
runAsUser: 472
fsGroup: 472
It is a setting described in official documentation: Kubernetes.io: set the security context for a pod
Please take a look on this Github issue which is similar to yours and pointed me to solution that allowed pod to spawn correctly:
https://github.com/grafana/grafana-docker/issues/167
Grafana had some major updates starting from version 5.1. Please take a look: Grafana.com: Docs: Migrate to v5.1 or later
Please let me know if this helps.
On v8.0, I do that setting runAsUser: 0.
It works.
---
apiVersion: v1
kind: Service
metadata:
name: grafana
spec:
ports:
- name: grafana-tcp
port: 3000
protocol: TCP
targetPort: 3000
selector:
project: grafana
type: LoadBalancer
status:
loadBalancer: {}
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
project: grafana
name: grafana
spec:
replicas: 1
selector:
matchLabels:
project: grafana
strategy:
type: RollingUpdate
template:
metadata:
labels:
project: grafana
name: grafana
spec:
securityContext:
runAsUser: 0
containers:
- image: grafana/grafana
name: grafana
ports:
- containerPort: 3000
protocol: TCP
resources: {}
volumeMounts:
- mountPath: /var/lib/grafana
name: grafana-volume
volumes:
- name: grafana-volume
hostPath:
# directory location on host
path: /opt/grafana
# this field is optional
type: DirectoryOrCreate
restartPolicy: Always
status: {}