I want my prometheus server to scrape metrics from a pod.
I followed these steps:
Created a pod using deployment - kubectl apply -f sample-app.deploy.yaml
Exposed the same using kubectl apply -f sample-app.service.yaml
Deployed Prometheus server using helm upgrade -i prometheus prometheus-community/prometheus -f prometheus-values.yaml
created a serviceMonitor using kubectl apply -f service-monitor.yaml to add a target for prometheus.
All pods are running, but when I open prometheus dashboard, I don't see sample-app service as prometheus target, under status>targets in dashboard UI.
I've verified following:
I can see sample-app when I execute kubectl get servicemonitors
I can see sample-app exposes metrics in prometheus format under at /metrics
At this point I debugged further, entered into the prometheus pod using
kubectl exec -it pod/prometheus-server-65b759cb95-dxmkm -c prometheus-server sh
, and saw that proemetheus configuration (/etc/config/prometheus.yml) didn't have sample-app as one of the jobs so I edited the configmap using
kubectl edit cm prometheus-server -o yaml
Added
- job_name: sample-app
static_configs:
- targets:
- sample-app:8080
Assuming all other fields such as scraping interval, scrape_timeout stays default.
I can see the same has been reflected in /etc/config/prometheus.yml, but still prometheus dashboard doesn't show sample-app as targets under status>targets.
following are yamls for prometheus-server and service monitor.
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
autopilot.gke.io/resource-adjustment: '{"input":{"containers":[{"name":"prometheus-server-configmap-reload"},{"name":"prometheus-server"}]},"output":{"containers":[{"limits":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"requests":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"name":"prometheus-server-configmap-reload"},{"limits":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"requests":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"name":"prometheus-server"}]},"modified":true}'
deployment.kubernetes.io/revision: "1"
meta.helm.sh/release-name: prometheus
meta.helm.sh/release-namespace: prom
creationTimestamp: "2021-06-24T10:42:31Z"
generation: 1
labels:
app: prometheus
app.kubernetes.io/managed-by: Helm
chart: prometheus-14.2.1
component: server
heritage: Helm
release: prometheus
name: prometheus-server
namespace: prom
resourceVersion: "6983855"
selfLink: /apis/apps/v1/namespaces/prom/deployments/prometheus-server
uid: <some-uid>
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: prometheus
component: server
release: prometheus
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: prometheus
chart: prometheus-14.2.1
component: server
heritage: Helm
release: prometheus
spec:
containers:
- args:
- --volume-dir=/etc/config
- --webhook-url=http://127.0.0.1:9090/-/reload
image: jimmidyson/configmap-reload:v0.5.0
imagePullPolicy: IfNotPresent
name: prometheus-server-configmap-reload
resources:
limits:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
requests:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
securityContext:
capabilities:
drop:
- NET_RAW
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/config
name: config-volume
readOnly: true
- args:
- --storage.tsdb.retention.time=15d
- --config.file=/etc/config/prometheus.yml
- --storage.tsdb.path=/data
- --web.console.libraries=/etc/prometheus/console_libraries
- --web.console.templates=/etc/prometheus/consoles
- --web.enable-lifecycle
image: quay.io/prometheus/prometheus:v2.26.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /-/healthy
port: 9090
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 15
successThreshold: 1
timeoutSeconds: 10
name: prometheus-server
ports:
- containerPort: 9090
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /-/ready
port: 9090
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 4
resources:
limits:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
requests:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
securityContext:
capabilities:
drop:
- NET_RAW
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/config
name: config-volume
- mountPath: /data
name: storage-volume
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 65534
runAsGroup: 65534
runAsNonRoot: true
runAsUser: 65534
seccompProfile:
type: RuntimeDefault
serviceAccount: prometheus-server
serviceAccountName: prometheus-server
terminationGracePeriodSeconds: 300
volumes:
- configMap:
defaultMode: 420
name: prometheus-server
name: config-volume
- name: storage-volume
persistentVolumeClaim:
claimName: prometheus-server
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2021-06-24T10:43:25Z"
lastUpdateTime: "2021-06-24T10:43:25Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2021-06-24T10:42:31Z"
lastUpdateTime: "2021-06-24T10:43:25Z"
message: ReplicaSet "prometheus-server-65b759cb95" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 1
readyReplicas: 1
replicas: 1
updatedReplicas: 1
yaml for service Monitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"monitoring.coreos.com/v1","kind":"ServiceMonitor","metadata":{"annotations":{},"creationTimestamp":"2021-06-24T07:55:58Z","generation":1,"labels":{"app":"sample-app","release":"prometheus"},"name":"sample-app","namespace":"prom","resourceVersion":"6884573","selfLink":"/apis/monitoring.coreos.com/v1/namespaces/prom/servicemonitors/sample-app","uid":"34644b62-eb4f-4ab1-b9df-b22811e40b4c"},"spec":{"endpoints":[{"port":"http"}],"selector":{"matchLabels":{"app":"sample-app","release":"prometheus"}}}}
creationTimestamp: "2021-06-24T07:55:58Z"
generation: 2
labels:
app: sample-app
release: prometheus
name: sample-app
namespace: prom
resourceVersion: "6904642"
selfLink: /apis/monitoring.coreos.com/v1/namespaces/prom/servicemonitors/sample-app
uid: <some-uid>
spec:
endpoints:
- port: http
selector:
matchLabels:
app: sample-app
release: prometheus
You need to use the prometheus-community/kube-prometheus-stack chart, which includes the Prometheus operator, in order to have Prometheus' configuration update automatically based on ServiceMonitor resources.
The prometheus-community/prometheus chart you used does not include the Prometheus operator that watches for ServiceMonitor resources in the Kubernetes API and updates the Prometheus server's ConfigMap accordingly.
It seems that you have the necessary CustomResourceDefinitions (CRDs) installed in your cluster, otherwise you would not have been able to create a ServiceMonitor resource. These are not included in the prometheus-community/prometheus chart so perhaps they were added to your cluster previously.
I've got a preemptible node pool which is clearly under-utilized:
The node pool hosts a deployment with HPA with the following setup:
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend
labels:
app: backend
spec:
replicas: 1
selector:
matchLabels:
app: backend
template:
metadata:
labels:
app: backend
spec:
initContainers:
- name: wait-for-database
image: ### IMAGE ###
command: ['bash', 'init.sh']
containers:
- name: backend
image: ### IMAGE ###
command: ["bash", "entrypoint.sh"]
imagePullPolicy: Always
resources:
requests:
memory: "200M"
cpu: "50m"
ports:
- name: probe-port
containerPort: 8080
hostPort: 8080
volumeMounts:
- name: static-shared-data
mountPath: /static
readinessProbe:
httpGet:
path: /readiness/
port: probe-port
failureThreshold: 5
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
- name: nginx
image: nginx:alpine
resources:
requests:
memory: "400M"
cpu: "20m"
ports:
- containerPort: 80
volumeMounts:
- name: nginx-proxy-config
mountPath: /etc/nginx/conf.d/default.conf
subPath: app.conf
- name: static-shared-data
mountPath: /static
volumes:
- name: nginx-proxy-config
configMap:
name: backend-nginx
- name: static-shared-data
emptyDir: {}
nodeSelector:
cloud.google.com/gke-nodepool: app-dev
tolerations:
- effect: NoSchedule
key: workload
operator: Equal
value: dev
---
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: backend
namespace: default
spec:
maxReplicas: 12
minReplicas: 8
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: Deployment
name: backend
metrics:
- resource:
name: cpu
targetAverageUtilization: 50
type: Resource
---
The node pool also has the toleration label.
The HPA utilization shows this:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
backend-develop Deployment/backend-develop 10%/50% 8 12 8 38d
But the node pool does not scale down for about a day. No heavy load on this deployment:
NAME STATUS ROLES AGE VERSION
gke-dev-app-dev-fee1a901-fvw9 Ready <none> 22h v1.14.10-gke.36
gke-dev-app-dev-fee1a901-gls7 Ready <none> 22h v1.14.10-gke.36
gke-dev-app-dev-fee1a901-lf3f Ready <none> 24h v1.14.10-gke.36
gke-dev-app-dev-fee1a901-lgw9 Ready <none> 3d10h v1.14.10-gke.36
gke-dev-app-dev-fee1a901-qxkz Ready <none> 3h35m v1.14.10-gke.36
gke-dev-app-dev-fee1a901-s10l Ready <none> 22h v1.14.10-gke.36
gke-dev-app-dev-fee1a901-sj4d Ready <none> 22h v1.14.10-gke.36
gke-dev-app-dev-fee1a901-vdnw Ready <none> 27h v1.14.10-gke.36
There's no affinity settings for this deployment and node pool. Some of the nodes easily pack several same pods, but others just hold one pod for hours, no scale down happens.
What could be wrong?
The issue was:
hostPort: 8080
This lead to FailedScheduling didn't have free ports.
That's why the nodes were kept online.
I have got an issue with deploy some pods on my k8s node. The error is following:
Failed create pod sandbox: rpc error: code = Unknown desc = failed to
set up sandbox container
"7da8bce09dd6820a65754073b1b4e52e640291dcb82f1da87ae99570c6964d1b"
network for pod "webservices-8675d4667d-7mdf9": networkPlugin cni
failed to set up pod "webservices-8675d4667d-7mdf9_default" network:
Get https://[10.233.0.1]:443/api/v1/namespaces/default: dial tcp
10.233.0.1:443: i/o timeout
However, some pods are deployed, for example kubernetes-dashboard:
Update:
NAME STATUS ROLES AGE VERSION LABELS
k8s-master.mariyo.eu Ready master 3d15h v1.16.6 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master.mariyo.eu,kubernetes.io/os=linux,node-role.kubernetes.io/master=
k8s-node-1.mariyo.eu Ready <none> 3d15h v1.16.6 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-1.mariyo.eu,kubernetes.io/os=linux
Deployment for coredns:
kind: Deployment
apiVersion: apps/v1
metadata:
name: coredns
namespace: kube-system
selfLink: /apis/apps/v1/namespaces/kube-system/deployments/coredns
uid: bd5451ec-2a33-443d-8519-ffcec935ac0c
resourceVersion: '397508'
generation: 2
creationTimestamp: '2020-01-24T16:14:37Z'
labels:
addonmanager.kubernetes.io/mode: Reconcile
k8s-app: kube-dns
kubernetes.io/cluster-service: 'true'
kubernetes.io/name: coredns
annotations:
deployment.kubernetes.io/revision: '1'
kubectl.kubernetes.io/last-applied-configuration: >
{"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"addonmanager.kubernetes.io/mode":"Reconcile","k8s-app":"kube-dns","kubernetes.io/cluster-service":"true","kubernetes.io/name":"coredns"},"name":"coredns","namespace":"kube-system"},"spec":{"selector":{"matchLabels":{"k8s-app":"kube-dns"}},"strategy":{"rollingUpdate":{"maxSurge":"10%","maxUnavailable":0},"type":"RollingUpdate"},"template":{"metadata":{"annotations":{"seccomp.security.alpha.kubernetes.io/pod":"docker/default"},"labels":{"k8s-app":"kube-dns"}},"spec":{"affinity":{"nodeAffinity":{"preferredDuringSchedulingIgnoredDuringExecution":[{"preference":{"matchExpressions":[{"key":"node-role.kubernetes.io/master","operator":"In","values":[""]}]},"weight":100}]},"podAntiAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":[{"labelSelector":{"matchLabels":{"k8s-app":"kube-dns"}},"topologyKey":"kubernetes.io/hostname"}]}},"containers":[{"args":["-conf","/etc/coredns/Corefile"],"image":"docker.io/coredns/coredns:1.6.0","imagePullPolicy":"IfNotPresent","livenessProbe":{"failureThreshold":10,"httpGet":{"path":"/health","port":8080,"scheme":"HTTP"},"successThreshold":1,"timeoutSeconds":5},"name":"coredns","ports":[{"containerPort":53,"name":"dns","protocol":"UDP"},{"containerPort":53,"name":"dns-tcp","protocol":"TCP"},{"containerPort":9153,"name":"metrics","protocol":"TCP"}],"readinessProbe":{"failureThreshold":10,"httpGet":{"path":"/ready","port":8181,"scheme":"HTTP"},"successThreshold":1,"timeoutSeconds":5},"resources":{"limits":{"memory":"170Mi"},"requests":{"cpu":"100m","memory":"70Mi"}},"securityContext":{"allowPrivilegeEscalation":false,"capabilities":{"add":["NET_BIND_SERVICE"],"drop":["all"]},"readOnlyRootFilesystem":true},"volumeMounts":[{"mountPath":"/etc/coredns","name":"config-volume"}]}],"dnsPolicy":"Default","nodeSelector":{"beta.kubernetes.io/os":"linux"},"priorityClassName":"system-cluster-critical","serviceAccountName":"coredns","tolerations":[{"effect":"NoSchedule","key":"node-role.kubernetes.io/master"},{"key":"CriticalAddonsOnly","operator":"Exists"}],"volumes":[{"configMap":{"items":[{"key":"Corefile","path":"Corefile"}],"name":"coredns"},"name":"config-volume"}]}}}}
spec:
replicas: 2
selector:
matchLabels:
k8s-app: kube-dns
template:
metadata:
creationTimestamp: null
labels:
k8s-app: kube-dns
annotations:
seccomp.security.alpha.kubernetes.io/pod: docker/default
spec:
volumes:
- name: config-volume
configMap:
name: coredns
items:
- key: Corefile
path: Corefile
defaultMode: 420
containers:
- name: coredns
image: 'docker.io/coredns/coredns:1.6.0'
args:
- '-conf'
- /etc/coredns/Corefile
ports:
- name: dns
containerPort: 53
protocol: UDP
- name: dns-tcp
containerPort: 53
protocol: TCP
- name: metrics
containerPort: 9153
protocol: TCP
resources:
limits:
memory: 170Mi
requests:
cpu: 100m
memory: 70Mi
volumeMounts:
- name: config-volume
mountPath: /etc/coredns
livenessProbe:
httpGet:
path: /health
port: 8080
scheme: HTTP
timeoutSeconds: 5
periodSeconds: 10
successThreshold: 1
failureThreshold: 10
readinessProbe:
httpGet:
path: /ready
port: 8181
scheme: HTTP
timeoutSeconds: 5
periodSeconds: 10
successThreshold: 1
failureThreshold: 10
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
securityContext:
capabilities:
add:
- NET_BIND_SERVICE
drop:
- all
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: Default
nodeSelector:
beta.kubernetes.io/os: linux
serviceAccountName: coredns
serviceAccount: coredns
securityContext: {}
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: node-role.kubernetes.io/master
operator: In
values:
- ''
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
k8s-app: kube-dns
topologyKey: kubernetes.io/hostname
schedulerName: default-scheduler
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
- key: CriticalAddonsOnly
operator: Exists
priorityClassName: system-cluster-critical
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 10%
revisionHistoryLimit: 10
progressDeadlineSeconds: 600
status:
observedGeneration: 2
replicas: 2
updatedReplicas: 2
readyReplicas: 1
availableReplicas: 1
unavailableReplicas: 1
conditions:
- type: Progressing
status: 'True'
lastUpdateTime: '2020-01-24T16:14:42Z'
lastTransitionTime: '2020-01-24T16:14:37Z'
reason: NewReplicaSetAvailable
message: ReplicaSet "coredns-58687784f9" has successfully progressed.
- type: Available
status: 'False'
lastUpdateTime: '2020-01-27T17:42:57Z'
lastTransitionTime: '2020-01-27T17:42:57Z'
reason: MinimumReplicasUnavailable
message: Deployment does not have minimum availability.
Deployment for webservices:
kind: Deployment
apiVersion: apps/v1
metadata:
name: webservices
namespace: default
selfLink: /apis/apps/v1/namespaces/default/deployments/webservices
uid: da75d3d8-92f4-4d06-86d6-e2fb325806a5
resourceVersion: '398529'
generation: 1
creationTimestamp: '2020-01-27T08:05:16Z'
labels:
run: webservices
annotations:
deployment.kubernetes.io/revision: '1'
spec:
replicas: 5
selector:
matchLabels:
run: webservices
template:
metadata:
creationTimestamp: null
labels:
run: webservices
spec:
containers:
- name: webservices
image: nginx
ports:
- containerPort: 80
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: Always
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
securityContext: {}
schedulerName: default-scheduler
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 25%
revisionHistoryLimit: 10
progressDeadlineSeconds: 600
status:
observedGeneration: 1
replicas: 5
updatedReplicas: 5
unavailableReplicas: 5
conditions:
- type: Available
status: 'False'
lastUpdateTime: '2020-01-27T08:05:16Z'
lastTransitionTime: '2020-01-27T08:05:16Z'
reason: MinimumReplicasUnavailable
message: Deployment does not have minimum availability.
- type: Progressing
status: 'False'
lastUpdateTime: '2020-01-27T17:52:58Z'
lastTransitionTime: '2020-01-27T17:52:58Z'
reason: ProgressDeadlineExceeded
message: ReplicaSet "webservices-8675d4667d" has timed out progressing.
Finally, I decided to reinstall nodes from Debian 10 to Ubuntu 18.04 and everything works as expected.
Thank you for your time
Problem is that kube-proxy isn't functioning correctly as I believe the 10.233.0.1 is the kubernetes api service address which it is responsible for configuring/setting up. You should check kube-proxy logs and see that it is healthy and create the iptables rules for the kubernetes services.
Take a look here: calico-timeout-pod.
I had to set the following on the worker node as well, before joining it, for it to work:
sudo sysctl net.bridge.bridge-nf-call-iptables=1
I was having a similar issue. I am using microk8s in my instance. it seems the node needs to advertise itself to the cluster. I hope it points you in the right direction (repost from github):
microk8s stop
# or for workers: sudo snap stop microk8s
sudo vim.tiny /var/snap/microk8s/current/args/kubelet
# Add this to bottom: --node-ip=<this-specific-node-lan-ip>
sudo vim.tiny /var/snap/microk8s/current/args/kube-apiserver
# Add this to bottom: --advertise-address=<this-specific-node-lan-ip>
microk8s start
# or for workers: sudo snap start microk8s
I have a jenkins image, I made service as NodeType. It works well. Since I will add more services, I need to use ingress nginx to divert traffic to different kinds of services.
At this moment, I use my win10 to set up two vms (Centos 7.5). One vm as master1, it has two internal IPv4 address (10.0.2.9 and 192.168.56.103) and one vm as worker node4 (10.0.2.6 and 192.168.56.104).
All images are local. I have downloaded into local docker image repository. The problem is that Nginx ingress does not run.
My configuration as follows:
ingress-nginx-ctl.yaml:
apiVersion: extensions/v1beta1
metadata:
name: ingress-nginx
namespace: default
spec:
replicas: 1
template:
metadata:
labels:
app: ingress-nginx
spec:
terminationGracePeriodSeconds: 60
containers:
- image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.20.0
name: ingress-nginx
imagePullPolicy: Never
ports:
- name: http
containerPort: 80
protocol: TCP
- name: https
containerPort: 443
protocol: TCP
livenessProbe:
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 5
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
args:
- /nginx-ingress-controller
- --default-backend-service=$(POD_NAMESPACE)/nginx-default-backend
ingress-nginx-res.yaml:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: my-ingress
namespace: default
spec:
rules:
- host:
http:
paths:
- path: /
backend:
serviceName: shinyinfo-jenkins-svc
servicePort: 8080
nginx-default-backend.yaml
kind: Service
apiVersion: v1
metadata:
name: nginx-default-backend
namespace: default
spec:
ports:
- port: 80
targetPort: http
selector:
app: nginx-default-backend
---
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
name: nginx-default-backend
namespace: default
spec:
replicas: 1
template:
metadata:
labels:
app: nginx-default-backend
spec:
terminationGracePeriodSeconds: 60
containers:
- name: default-http-backend
image: chenliujin/defaultbackend
imagePullPolicy: Never
livenessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 5
resources:
limits:
cpu: 10m
memory: 10Mi
requests:
cpu: 10m
memory: 10Mi
ports:
- name: http
containerPort: 8080
protocol: TCP
shinyinfo-jenkins-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: shinyinfo-jenkins
labels:
app: shinyinfo-jenkins
spec:
containers:
- name: shinyinfo-jenkins
image: shinyinfo_jenkins
imagePullPolicy: Never
ports:
- containerPort: 8080
containerPort: 50000
volumeMounts:
- mountPath: /devops/password
name: jenkins-password
- mountPath: /var/jenkins_home
name: jenkins-home
volumes:
- name: jenkins-password
hostPath:
path: /jenkins/password
- name: jenkins-home
hostPath:
path: /jenkins
shinyinfo-jenkins-svc.yaml
apiVersion: v1
kind: Service
metadata:
name: shinyinfo-jenkins-svc
labels:
name: shinyinfo-jenkins-svc
spec:
selector:
app: shinyinfo-jenkins
type: NodePort
ports:
- name: tcp
port: 8080
nodePort: 30003
There is something wrong with nginx ingress, the console output is as follows:
[master#master1 config]$ sudo kubectl apply -f ingress-nginx-ctl.yaml
service/ingress-nginx created
deployment.extensions/ingress-nginx created
[master#master1 config]$ sudo kubectl apply -f ingress-nginx-res.yaml
ingress.extensions/my-ingress created
Images is CrashLoopBackOff, Why???
[master#master1 config]$ sudo kubectl get po
NAME READY STATUS RESTARTS AGE
ingress-nginx-66df6b6d9-mhmj9 0/1 CrashLoopBackOff 1 9s
nginx-default-backend-645546c46f-x7s84 1/1 Running 0 6m
shinyinfo-jenkins 1/1 Running 0 20m
describe pod:
[master#master1 config]$ sudo kubectl describe po ingress-nginx-66df6b6d9-mhmj9
Name: ingress-nginx-66df6b6d9-mhmj9
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: node4/192.168.56.104
Start Time: Thu, 08 Nov 2018 16:45:46 +0800
Labels: app=ingress-nginx
pod-template-hash=228926285
Annotations: <none>
Status: Running
IP: 100.127.10.211
Controlled By: ReplicaSet/ingress-nginx-66df6b6d9
Containers:
ingress-nginx:
Container ID: docker://2aba164d116758585abef9d893a5fa0f0c5e23c04a13466263ce357ebe10cb0a
Image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.20.0
Image ID: docker://sha256:a3f21ec4bd119e7e17c8c8b2bf8a3b9e42a8607455826cd1fa0b5461045d2fa9
Ports: 80/TCP, 443/TCP
Host Ports: 0/TCP, 0/TCP
Args:
/nginx-ingress-controller
--default-backend-service=$(POD_NAMESPACE)/nginx-default-backend
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Thu, 08 Nov 2018 16:46:09 +0800
Finished: Thu, 08 Nov 2018 16:46:09 +0800
Ready: False
Restart Count: 2
Liveness: http-get http://:10254/healthz delay=30s timeout=5s period=10s #success=1 #failure=3
Environment:
POD_NAME: ingress-nginx-66df6b6d9-mhmj9 (v1:metadata.name)
POD_NAMESPACE: default (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-24hnm (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-24hnm:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-24hnm
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 40s default-scheduler Successfully assigned default/ingress-nginx-66df6b6d9-mhmj9 to node4
Normal Pulled 18s (x3 over 39s) kubelet, node4 Container image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.20.0" already present on machine
Normal Created 18s (x3 over 39s) kubelet, node4 Created container
Normal Started 17s (x3 over 39s) kubelet, node4 Started container
Warning BackOff 11s (x5 over 36s) kubelet, node4 Back-off restarting failed container
logs of pod:
[master#master1 config]$ sudo kubectl logs ingress-nginx-66df6b6d9-mhmj9
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: 0.20.0
Build: git-e8d8103
Repository: https://github.com/kubernetes/ingress-nginx.git
-------------------------------------------------------------------------------
nginx version: nginx/1.15.5
W1108 08:47:16.081042 6 client_config.go:552] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I1108 08:47:16.081234 6 main.go:196] Creating API client for https://10.96.0.1:443
I1108 08:47:16.122315 6 main.go:240] Running in Kubernetes cluster version v1.11 (v1.11.3) - git (clean) commit a4529464e4629c21224b3d52edfe0ea91b072862 - platform linux/amd64
F1108 08:47:16.123661 6 main.go:97] ✖ The cluster seems to be running with a restrictive Authorization mode and the Ingress controller does not have the required permissions to operate normally.
Could experts here drop me some hints?
You need set ingress-nginx to use a seperate serviceaccount and give neccessary privilege to the serviceaccount.
here is a example:
apiVersion: v1
kind: ServiceAccount
metadata:
name: lb
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: nginx-ingress-normal
rules:
- apiGroups:
- ""
resources:
- configmaps
- endpoints
- nodes
- pods
- secrets
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- apiGroups:
- ""
resources:
- services
verbs:
- get
- list
- watch
- apiGroups:
- "extensions"
resources:
- ingresses
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- events
verbs:
- create
- patch
- apiGroups:
- "extensions"
resources:
- ingresses/status
verbs:
- update
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
name: nginx-ingress-minimal
namespace: kube-system
rules:
- apiGroups:
- ""
resources:
- configmaps
- pods
- secrets
- namespaces
verbs:
- get
- apiGroups:
- ""
resources:
- configmaps
resourceNames:
- "ingress-controller-leader-dev"
- "ingress-controller-leader-prod"
verbs:
- get
- update
- apiGroups:
- ""
resources:
- configmaps
verbs:
- create
- apiGroups:
- ""
resources:
- endpoints
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
name: nginx-ingress-minimal
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: nginx-ingress-minimal
subjects:
- kind: ServiceAccount
name: lb
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: nginx-ingress-normal
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: nginx-ingress-normal
subjects:
- kind: ServiceAccount
name: lb
namespace: kube-system
For the past couple of days we have been experiencing an intermittent deployment failure when deploying (via Helm) to Kubernetes v1.11.2.
When it fails, kubectl describe <deployment> usually reports that the container failed to create:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 1s default-scheduler Successfully assigned default/pod-fc5c8d4b8-99npr to fh1-node04
Normal Pulling 0s kubelet, fh1-node04 pulling image "docker-registry.internal/pod:0e5a0cb1c0e32b6d0e603333ebb81ade3427ccdd"
Error from server (BadRequest): container "pod" in pod "pod-fc5c8d4b8-99npr" is waiting to start: ContainerCreating
and the only issue we can find in the kubelet logs is:
58468 kubelet_pods.go:146] Mount cannot be satisfied for container "pod", because the volume is missing or the volume mounter is nil: {Name:default-token-q8k7w ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}
58468 kuberuntime_manager.go:733] container start failed: CreateContainerConfigError: cannot find volume "default-token-q8k7w" to mount container start failed: CreateContainerConfigError: cannot find volume "default-token-q8k7w" to mount into container "pod"
It's intermittent which means it fails around once in every 20 or so deployments. Re-running the deployment works as expected.
The cluster and node health all look fine at the time of the deployment, so we are at a loss as to where to go from here. Looking for advice on where to start next on diagnosing the issue.
EDIT: As requested, the deployment file is generated via a Helm template and the output is shown below. For further information, the same Helm template is used for a lot of our services, but only this particular service has this intermittent issue:
apiVersion: apps/v1beta2
kind: Deployment
metadata:
name: pod
labels:
app: pod
chart: pod-0.1.0
release: pod
heritage: Tiller
environment: integration
annotations:
kubernetes.io/change-cause: https://github.com/path_to_release
spec:
replicas: 2
revisionHistoryLimit: 3
selector:
matchLabels:
app: pod
release: pod
environment: integration
strategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
app: pod
release: pod
environment: integration
spec:
containers:
- name: pod
image: "docker-registry.internal/pod:0e5a0cb1c0e32b6d0e603333ebb81ade3427ccdd"
env:
- name: VAULT_USERNAME
valueFrom:
secretKeyRef:
name: "pod-integration"
key: username
- name: VAULT_PASSWORD
valueFrom:
secretKeyRef:
name: "pod-integration"
key: password
imagePullPolicy: IfNotPresent
command: ['mix', 'phx.server']
ports:
- name: http
containerPort: 80
protocol: TCP
envFrom:
- configMapRef:
name: pod
livenessProbe:
httpGet:
path: /api/health
port: http
initialDelaySeconds: 10
readinessProbe:
httpGet:
path: /api/health
port: http
initialDelaySeconds: 10
resources:
limits:
cpu: 750m
memory: 200Mi
requests:
cpu: 500m
memory: 150Mi