Kubernetes get pods Crash - kubernetes

I am trying to create a "logstash 6.5.4" pod using this file (cdn_akamai.yml) in machines with 16G of memory and 8 vCPUS
---
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
name: cdn-akamai-pipe
spec:
template:
metadata:
labels:
app: cdn-akamai-pipe
spec:
securityContext:
runAsUser: 0
runAsGroup: 0
hostname: cdn-akamai-pipe
containers:
- name: cdn-akamai-pipe
resources:
limits:
memory: "1Gi"
requests:
memory: "1Gi"
ports:
- containerPort: 9600
image: docker.elastic.co/logstash/logstash:6.5.4
volumeMounts:
- name: cdn-akamai-pipe-config
mountPath: /usr/share/logstash/pipeline/cdn_akamai.conf
subPath: cdn_akamai.conf
- name: logstash-jvm-options-config
mountPath: /usr/share/logstash/config/jvm.options
subPath: jvm.options
- name: pipeline-config
mountPath: /usr/share/logstash/config/pipelines.yml
subPath: pipelines.yml
command:
- logstash
volumes:
- name: cdn-akamai-pipe-config
configMap:
name: cdn-akamai-pipe
- name: logstash-jvm-options-config
configMap:
name: logstash-jvm-options
- name: pipeline-config
configMap:
name: pipeline-akamai
---
kind: Service
apiVersion: v1
metadata:
name: cdn-akamai-pipe
spec:
type: NodePort
selector:
app: cdn-akamai-pipe
ports:
- protocol: TCP
port: 9600
targetPort: 9600
name: logstash
And using the next commands
kubectl create configmap logstash-jvm-options --from-file jvm.options
kubectl create configmap cdn-akamai-pipe --from-file cdn_akamai.conf
kubectl create configmap pipeline-akamai --from-file pipelines.yml
kubectl create -f cdn_akamai.yml
where the files are
Data
====
pipelines.yml:
----
- pipeline.id: main
path.config: "/usr/share/logstash/pipeline/cdn_akamai.conf
Data
====
jvm.options:
----
-Xms800m
-Xmx800m
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-Djava.awt.headless=true
-Dfile.encoding=UTF-8
-Djruby.compile.invokedynamic=true
-Djruby.jit.threshold=0
-XX:+HeapDumpOnOutOfMemoryError
-Djava.security.egd=file:/dev/urandom
Data
====
cdn_akamai.conf:
----
input {stdin{}}
output {stdout{}}
But I get Error or CrashLoopBackOff as
cdn-akamai-pipe-74c64757b9-t4k2f 0/1 Error 7 12m
I had running a hello word pod to verify if my cluster is ok, and this pod run perfectly.
Could you help me, please? Is there any problem with my volumes, service, syntax, Do you see anything?

Related

Access kubernetes job pods using hostname

I am trying to run a k8s job with 2 pods in which one pod will try to connect to other pod.
I cannot connect to other pod using hostname of the pod as suggested in the doc - https://kubernetes.io/docs/concepts/workloads/controllers/job/#completion-mode.
I have created a service and trying to access the pod as k8s-train-0.default.svc.cluster.local as mentioned in the document.
apiVersion: batch/v1
kind: Job
metadata:
name: k8s-train
spec:
parallelism: 2
completions: 2
completionMode: Indexed
manualSelector: true
selector:
matchLabels:
app.kubernetes.io/name: proxy
template:
metadata:
labels:
app.kubernetes.io/name: proxy
spec:
containers:
- name: k8s-train
image: pytorch/pytorch:1.11.0-cuda11.3-cudnn8-runtime
command: ["/bin/sh","-c"]
args:
- echo starting;
export MASTER_PORT=54321;
export MASTER_ADDR=k8s-train-0.trainsvc.default.svc.cluster.local;
export WORLD_SIZE=8;
pip install -r /data/requirements.txt;
export NCCL_DEBUG=INFO;
python /data/bert.py --strategy=ddp --num_nodes=2 --gpus=4 --max_epochs=3;
echo done;
env:
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
ports:
- containerPort: 54321
name: master-port
resources:
requests:
nvidia.com/gpu: 4
limits:
nvidia.com/gpu: 4
volumeMounts:
- mountPath: /data
name: data
volumes:
- name: data
persistentVolumeClaim:
claimName: efs-claim
restartPolicy: Never
backoffLimit: 0
---
apiVersion: v1
kind: Service
metadata:
name: trainsvc
spec:
selector:
app.kubernetes.io/name: proxy
ports:
- name: master-svc-port
protocol: TCP
port: 54321
targetPort: master-port
clusterIP: None
https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
I am looking to establish communication between pod either using the hostname or to assign svc only to one pod slected with job-index.
Please let me know if i'm missing something here.
Thanks.

How to run DNS Server (dnsmasq) in Kubernetes?

I'm trying to run DNS Server (Dnsmasq) in Kubernetes cluster. The cluster has only one node. Everything works fine until I need to restart dnsmasq container (kubectl rollout restart daemonsets dnsmasq-daemonset) to apply changes made to hosts ConfigMap. As I found out this is needed as Dnsmasq that is already running will not otherwise load changes made into hosts ConfigMap.
Soon as the container is restarted it is not able to pull dnsmasq image and it fails. It is expected behavior as it cannot resolve the image name as there are no other dns servers running, but I wonder what would be best way around it or what are the best practices with running DNS Server in Kubernetes in general. Is this something that CoreDNS is used for or what other alternatives are there? Maybe some high availability solution?
hosts ConfigMap:
---
apiVersion: v1
kind: ConfigMap
metadata:
name: dnsmasq-hosts
namespace: core
data:
hosts: |
127.0.0.1 localhost
10.x.x.x example.com
...
Dnsmasq deployment:
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: dnsmasq-daemonset
namespace: core
spec:
selector:
matchLabels:
app: dnsmasq-app
template:
metadata:
labels:
app: dnsmasq-app
namespace: core
spec:
containers:
- name: dnsmasq
image: registry.gitlab.com/path/to/dnsmasqImage:tag
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: "1"
memory: "32Mi"
requests:
cpu: "150m"
memory: "16Mi"
ports:
- name: dns
containerPort: 53
hostPort: 53
protocol: UDP
volumeMounts:
- name: conf-dnsmasq
mountPath: /etc/dnsmasq.conf
subPath: dnsmasq.conf
readOnly: true
- name: dnsconf-dnsmasq
mountPath: /etc/dnsmasq.d/dns.conf
subPath: dns.conf
readOnly: true
- name: hosts-dnsmasq
mountPath: /etc/dnsmasq.d/hosts
subPath: hosts
readOnly: true
volumes:
- name: conf-dnsmasq
configMap:
name: dnsmasq-conf
- name: dnsconf-dnsmasq
configMap:
name: dnsmasq-dnsconf
- name: hosts-dnsmasq
configMap:
name: dnsmasq-hosts
imagePullSecrets:
- name: gitlab-registry-credentials
nodeSelector:
kubernetes.io/hostname: master
restartPolicy: Always
I tried to use imagePullPolicy: Never, but it seems to fail anyway.

K8s mounting persistentVolume failed, "timed out waiting for the condition" on docker-desktop

When trying to bind a pod to a NFS persistentVolume hosted on another pod, it fails to mount when using docker-desktop. It works perfectly fine elsewhere even with the exact same YAML.
The error:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m59s default-scheduler Successfully assigned test-project/test-digit-5576c79688-zfg8z to docker-desktop
Warning FailedMount 2m56s kubelet Unable to attach or mount volumes: unmounted volumes=[lagg-connection], unattached volumes=[lagg-connection kube-api-access-h68w7]: timed out waiting for the condition
Warning FailedMount 37s kubelet Unable to attach or mount volumes: unmounted volumes=[lagg-connection], unattached volumes=[kube-api-access-h68w7 lagg-connection]: timed out waiting for the condition
The minified project which you can apply to test yourself:
apiVersion: v1
kind: Namespace
metadata:
name: test-project
labels:
name: test-project
---
apiVersion: v1
kind: Service
metadata:
labels:
environment: test
name: test-lagg
namespace: test-project
spec:
clusterIP: 10.96.13.37
ports:
- name: nfs
port: 2049
- name: mountd
port: 20048
- name: rpcbind
port: 111
selector:
app: nfs-server
environment: test
scope: backend
---
apiVersion: v1
kind: PersistentVolume
metadata:
labels:
environment: test
name: test-lagg-volume
spec:
accessModes:
- ReadWriteMany
capacity:
storage: 2Gi
nfs:
path: /
server: 10.96.13.37
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
labels:
environment: test
name: test-lagg-claim
namespace: test-project
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 2Gi
storageClassName: ""
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: static
environment: test
scope: backend
name: test-digit
namespace: test-project
spec:
selector:
matchLabels:
app: static
environment: test
scope: backend
template:
metadata:
labels:
app: static
environment: test
scope: backend
spec:
containers:
- image: busybox
name: digit
imagePullPolicy: IfNotPresent
command: ['sh', '-c', 'echo Container 1 is Running ; sleep 3600']
volumeMounts:
- mountPath: /cache
name: lagg-connection
volumes:
- name: lagg-connection
persistentVolumeClaim:
claimName: test-lagg-claim
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
environment: test
name: test-lagg
namespace: test-project
spec:
selector:
matchLabels:
app: nfs-server
environment: test
scope: backend
template:
metadata:
labels:
app: nfs-server
environment: test
scope: backend
spec:
containers:
- image: gcr.io/google_containers/volume-nfs:0.8
name: lagg
ports:
- containerPort: 2049
name: lagg
- containerPort: 20048
name: mountd
- containerPort: 111
name: rpcbind
securityContext:
privileged: true
volumeMounts:
- mountPath: /exports
name: lagg-claim
volumes:
- emptyDir: {}
name: lagg-claim
As well as emptyDir I have also tried hostPath. This setup has worked before, and I'm not sure what I've changed if anything since it has stopped.
Updating my Docker for Windows installation from 4.0.1 to 4.1.1 has fixed this problem.

Kubernetes MountVolume.NewMounter initialization failed for volume [name] : path [name] does not exist

i am trying to deploy elasticsearch cluster on Kubernetes, for that i am using local persistent volumes
here is my manifest files
persistantvolume.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: example-local-pv
spec:
capacity:
storage: 500Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: local-storage
local:
path: /home/kb/data
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- minikube
storage.yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
service.yaml
apiVersion: v1
kind: Service
metadata:
name: es
labels:
service: elasticsearch
spec:
clusterIP: None
ports:
- port: 9200
name: serving
- port: 9300
name: node-to-node
selector:
service: elasticsearch
elasticsearch.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch
labels:
service: elasticsearch
spec:
serviceName: es
replicas: 3
selector:
matchLabels:
service: elasticsearch
template:
metadata:
labels:
service: elasticsearch
spec:
terminationGracePeriodSeconds: 300
initContainers:
- name: fix-the-volume-permission
image: busybox
command:
- sh
- -c
- chown -R 1000:1000 /usr/share/elasticsearch/data
securityContext:
privileged: true
volumeMounts:
- name: data
mountPath: /usr/share/elasticsearch/data
- name: increase-the-vm-max-map-count
image: busybox
command:
- sysctl
- -w
- vm.max_map_count=262144
securityContext:
privileged: true
- name: increase-the-ulimit
image: busybox
command:
- sh
- -c
- ulimit -n 65536
securityContext:
privileged: true
containers:
- name: elasticsearch
image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.2.4
ports:
- containerPort: 9200
name: http
- containerPort: 9300
name: tcp
resources:
requests:
memory: 4Gi
limits:
memory: 6Gi
env:
- name: cluster.name
value: elasticsearch-cluster
- name: node.name
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: discovery.zen.ping.unicast.hosts
value: "elasticsearch-0.es.default.svc.cluster.local,elasticsearch-1.es.default.svc.cluster.local,elasticsearch-2.es.default.svc.cluster.local"
- name: ES_JAVA_OPTS
value: -Xms4g -Xmx4g
volumeMounts:
- name: data
mountPath: /usr/share/elasticsearch/data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes:
- ReadWriteOnce
storageClassName: local-storage
resources:
requests:
storage: 10Gi
kubectl apply -f persistantvolume.yaml
kubectl apply -f storage.yaml
kubectl apply -f service.yaml
kubectl apply -f elasticsearch.yaml
my pod is in Init:0/3 state and kube describe pod podname is
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 44s default-scheduler Successfully assigned default/elasticsearch-0 to minikube
Warning FailedMount 12s (x7 over 44s) kubelet MountVolume.NewMounter initialization failed for volume "example-local-pv" : path "/home/kb/data" does not exist
i am a beginner in Kubernetes please help me what i am missing here /home/kb/data do exists in my local drive
Assuming you launched minikube with one of its VM drivers, the /home/kb/data directory exists in your local drive but probably NOT inside its VM. Does that make sense? The Kubernetes local-storage thing won't create missing directories. If you JUST want to "fix the error", then minikube ssh -- mkdir /home/kb/data might do the trick. This answer explains more background details about this.

How can I give grafana user appropriate permission so that it can start successfully?

env:
kubernetes provider: gke
kubernetes version: v1.13.12-gke.25
grafana version: 6.6.2 (official image)
grafana deployment manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
name: grafana
labels:
app: grafana
spec:
containers:
- name: grafana
image: grafana/grafana:6.6.2
ports:
- name: grafana
containerPort: 3000
# securityContext:
# runAsUser: 104
# allowPrivilegeEscalation: true
resources:
limits:
memory: "1Gi"
cpu: "500m"
requests:
memory: "500Mi"
cpu: "100m"
volumeMounts:
- mountPath: /var/lib/grafana
name: grafana-storage
volumes:
- name: grafana-storage
persistentVolumeClaim:
claimName: grafana-pvc
Problem
when I deployed this grafana dashboard first time, its working fine. after sometime I restarted the pod to check whether volume mount is working or not. after restarting, I getting below error.
mkdir: can't create directory '/var/lib/grafana/plugins': Permission denied
GF_PATHS_DATA='/var/lib/grafana' is not writable.
You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migration-from-a-previous-version-of-the-docker-container-to-5-1-or-later
what I understand from this error, user could create these files. How can I give this user appropriate permission to start grafana successfully?
I recreated your deployment with appropriate PVC and noticed that grafana pod was failing.
Output of command: $ kubectl get pods -n monitoring
NAME READY STATUS RESTARTS AGE
grafana-6466cd95b5-4g95f 0/1 Error 2 65s
Further investigation pointed the same errors as yours:
mkdir: can't create directory '/var/lib/grafana/plugins': Permission denied
GF_PATHS_DATA='/var/lib/grafana' is not writable.
You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migration-from-a-previous-version-of-the-docker-container-to-5-1-or-later
This error showed on first creation of a pod and the deployment. There was no need to recreate any pods.
What I did to make it work was to edit your deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
name: grafana
labels:
app: grafana
spec:
securityContext:
runAsUser: 472
fsGroup: 472
containers:
- name: grafana
image: grafana/grafana:6.6.2
ports:
- name: grafana
containerPort: 3000
resources:
limits:
memory: "1Gi"
cpu: "500m"
requests:
memory: "500Mi"
cpu: "100m"
volumeMounts:
- mountPath: /var/lib/grafana
name: grafana-storage
volumes:
- name: grafana-storage
persistentVolumeClaim:
claimName: grafana-pvc
Please take a specific look on part:
securityContext:
runAsUser: 472
fsGroup: 472
It is a setting described in official documentation: Kubernetes.io: set the security context for a pod
Please take a look on this Github issue which is similar to yours and pointed me to solution that allowed pod to spawn correctly:
https://github.com/grafana/grafana-docker/issues/167
Grafana had some major updates starting from version 5.1. Please take a look: Grafana.com: Docs: Migrate to v5.1 or later
Please let me know if this helps.
On v8.0, I do that setting runAsUser: 0.
It works.
---
apiVersion: v1
kind: Service
metadata:
name: grafana
spec:
ports:
- name: grafana-tcp
port: 3000
protocol: TCP
targetPort: 3000
selector:
project: grafana
type: LoadBalancer
status:
loadBalancer: {}
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
project: grafana
name: grafana
spec:
replicas: 1
selector:
matchLabels:
project: grafana
strategy:
type: RollingUpdate
template:
metadata:
labels:
project: grafana
name: grafana
spec:
securityContext:
runAsUser: 0
containers:
- image: grafana/grafana
name: grafana
ports:
- containerPort: 3000
protocol: TCP
resources: {}
volumeMounts:
- mountPath: /var/lib/grafana
name: grafana-volume
volumes:
- name: grafana-volume
hostPath:
# directory location on host
path: /opt/grafana
# this field is optional
type: DirectoryOrCreate
restartPolicy: Always
status: {}