Unable to add a K8s service as prometheus target - kubernetes

I want my prometheus server to scrape metrics from a pod.
I followed these steps:
Created a pod using deployment - kubectl apply -f sample-app.deploy.yaml
Exposed the same using kubectl apply -f sample-app.service.yaml
Deployed Prometheus server using helm upgrade -i prometheus prometheus-community/prometheus -f prometheus-values.yaml
created a serviceMonitor using kubectl apply -f service-monitor.yaml to add a target for prometheus.
All pods are running, but when I open prometheus dashboard, I don't see sample-app service as prometheus target, under status>targets in dashboard UI.
I've verified following:
I can see sample-app when I execute kubectl get servicemonitors
I can see sample-app exposes metrics in prometheus format under at /metrics
At this point I debugged further, entered into the prometheus pod using
kubectl exec -it pod/prometheus-server-65b759cb95-dxmkm -c prometheus-server sh
, and saw that proemetheus configuration (/etc/config/prometheus.yml) didn't have sample-app as one of the jobs so I edited the configmap using
kubectl edit cm prometheus-server -o yaml
Added
- job_name: sample-app
static_configs:
- targets:
- sample-app:8080
Assuming all other fields such as scraping interval, scrape_timeout stays default.
I can see the same has been reflected in /etc/config/prometheus.yml, but still prometheus dashboard doesn't show sample-app as targets under status>targets.
following are yamls for prometheus-server and service monitor.
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
autopilot.gke.io/resource-adjustment: '{"input":{"containers":[{"name":"prometheus-server-configmap-reload"},{"name":"prometheus-server"}]},"output":{"containers":[{"limits":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"requests":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"name":"prometheus-server-configmap-reload"},{"limits":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"requests":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"name":"prometheus-server"}]},"modified":true}'
deployment.kubernetes.io/revision: "1"
meta.helm.sh/release-name: prometheus
meta.helm.sh/release-namespace: prom
creationTimestamp: "2021-06-24T10:42:31Z"
generation: 1
labels:
app: prometheus
app.kubernetes.io/managed-by: Helm
chart: prometheus-14.2.1
component: server
heritage: Helm
release: prometheus
name: prometheus-server
namespace: prom
resourceVersion: "6983855"
selfLink: /apis/apps/v1/namespaces/prom/deployments/prometheus-server
uid: <some-uid>
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: prometheus
component: server
release: prometheus
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: prometheus
chart: prometheus-14.2.1
component: server
heritage: Helm
release: prometheus
spec:
containers:
- args:
- --volume-dir=/etc/config
- --webhook-url=http://127.0.0.1:9090/-/reload
image: jimmidyson/configmap-reload:v0.5.0
imagePullPolicy: IfNotPresent
name: prometheus-server-configmap-reload
resources:
limits:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
requests:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
securityContext:
capabilities:
drop:
- NET_RAW
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/config
name: config-volume
readOnly: true
- args:
- --storage.tsdb.retention.time=15d
- --config.file=/etc/config/prometheus.yml
- --storage.tsdb.path=/data
- --web.console.libraries=/etc/prometheus/console_libraries
- --web.console.templates=/etc/prometheus/consoles
- --web.enable-lifecycle
image: quay.io/prometheus/prometheus:v2.26.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /-/healthy
port: 9090
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 15
successThreshold: 1
timeoutSeconds: 10
name: prometheus-server
ports:
- containerPort: 9090
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /-/ready
port: 9090
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 4
resources:
limits:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
requests:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
securityContext:
capabilities:
drop:
- NET_RAW
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/config
name: config-volume
- mountPath: /data
name: storage-volume
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 65534
runAsGroup: 65534
runAsNonRoot: true
runAsUser: 65534
seccompProfile:
type: RuntimeDefault
serviceAccount: prometheus-server
serviceAccountName: prometheus-server
terminationGracePeriodSeconds: 300
volumes:
- configMap:
defaultMode: 420
name: prometheus-server
name: config-volume
- name: storage-volume
persistentVolumeClaim:
claimName: prometheus-server
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2021-06-24T10:43:25Z"
lastUpdateTime: "2021-06-24T10:43:25Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2021-06-24T10:42:31Z"
lastUpdateTime: "2021-06-24T10:43:25Z"
message: ReplicaSet "prometheus-server-65b759cb95" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 1
readyReplicas: 1
replicas: 1
updatedReplicas: 1
yaml for service Monitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"monitoring.coreos.com/v1","kind":"ServiceMonitor","metadata":{"annotations":{},"creationTimestamp":"2021-06-24T07:55:58Z","generation":1,"labels":{"app":"sample-app","release":"prometheus"},"name":"sample-app","namespace":"prom","resourceVersion":"6884573","selfLink":"/apis/monitoring.coreos.com/v1/namespaces/prom/servicemonitors/sample-app","uid":"34644b62-eb4f-4ab1-b9df-b22811e40b4c"},"spec":{"endpoints":[{"port":"http"}],"selector":{"matchLabels":{"app":"sample-app","release":"prometheus"}}}}
creationTimestamp: "2021-06-24T07:55:58Z"
generation: 2
labels:
app: sample-app
release: prometheus
name: sample-app
namespace: prom
resourceVersion: "6904642"
selfLink: /apis/monitoring.coreos.com/v1/namespaces/prom/servicemonitors/sample-app
uid: <some-uid>
spec:
endpoints:
- port: http
selector:
matchLabels:
app: sample-app
release: prometheus

You need to use the prometheus-community/kube-prometheus-stack chart, which includes the Prometheus operator, in order to have Prometheus' configuration update automatically based on ServiceMonitor resources.
The prometheus-community/prometheus chart you used does not include the Prometheus operator that watches for ServiceMonitor resources in the Kubernetes API and updates the Prometheus server's ConfigMap accordingly.
It seems that you have the necessary CustomResourceDefinitions (CRDs) installed in your cluster, otherwise you would not have been able to create a ServiceMonitor resource. These are not included in the prometheus-community/prometheus chart so perhaps they were added to your cluster previously.

Related

Invalid host. To browse Nexus, click here/. To use the Docker registry, point your client at when access to nexus in kubernetes

I am using helm 3 to install nexus in kubernetes v1.18:
helm install stable/sonatype-nexus --name=nexus
and then expose nexus by using traefik 2.x to outside by using domian: nexus.dolphin.com. But when I using domain to access nexus servcie it give me this tips:
Invalid host. To browse Nexus, click here/. To use the Docker registry, point your client
and I have read this question, but It seem not suite for my situation. And this is my nexus yaml config now:
kind: Deployment
apiVersion: apps/v1
metadata:
name: nexus-sonatype-nexus
namespace: infrastructure
selfLink: /apis/apps/v1/namespaces/infrastructure/deployments/nexus-sonatype-nexus
uid: 023de15b-19eb-442d-8375-11532825919d
resourceVersion: '1710210'
generation: 3
creationTimestamp: '2020-08-16T07:17:07Z'
labels:
app: sonatype-nexus
app.kubernetes.io/managed-by: Helm
chart: sonatype-nexus-1.23.1
fullname: nexus-sonatype-nexus
heritage: Helm
release: nexus
annotations:
deployment.kubernetes.io/revision: '1'
meta.helm.sh/release-name: nexus
meta.helm.sh/release-namespace: infrastructure
managedFields:
- manager: Go-http-client
operation: Update
apiVersion: apps/v1
time: '2020-08-16T07:17:07Z'
fieldsType: FieldsV1
- manager: kube-controller-manager
operation: Update
apiVersion: apps/v1
time: '2020-08-18T16:26:34Z'
fieldsType: FieldsV1
spec:
replicas: 1
selector:
matchLabels:
app: sonatype-nexus
release: nexus
template:
metadata:
creationTimestamp: null
labels:
app: sonatype-nexus
release: nexus
spec:
volumes:
- name: nexus-sonatype-nexus-data
persistentVolumeClaim:
claimName: nexus-sonatype-nexus-data
- name: nexus-sonatype-nexus-backup
emptyDir: {}
containers:
- name: nexus
image: 'sonatype/nexus3:3.20.1'
ports:
- name: nexus-docker-g
containerPort: 5003
protocol: TCP
- name: nexus-http
containerPort: 8081
protocol: TCP
env:
- name: install4jAddVmParams
value: >-
-Xms1200M -Xmx1200M -XX:MaxDirectMemorySize=2G
-XX:+UnlockExperimentalVMOptions
-XX:+UseCGroupMemoryLimitForHeap
- name: NEXUS_SECURITY_RANDOMPASSWORD
value: 'false'
resources: {}
volumeMounts:
- name: nexus-sonatype-nexus-data
mountPath: /nexus-data
- name: nexus-sonatype-nexus-backup
mountPath: /nexus-data/backup
livenessProbe:
httpGet:
path: /
port: 8081
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 1
periodSeconds: 30
successThreshold: 1
failureThreshold: 6
readinessProbe:
httpGet:
path: /
port: 8081
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 1
periodSeconds: 30
successThreshold: 1
failureThreshold: 6
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
- name: nexus-proxy
image: 'quay.io/travelaudience/docker-nexus-proxy:2.5.0'
ports:
- name: nexus-proxy
containerPort: 8080
protocol: TCP
env:
- name: ALLOWED_USER_AGENTS_ON_ROOT_REGEX
value: GoogleHC
- name: CLOUD_IAM_AUTH_ENABLED
value: 'false'
- name: BIND_PORT
value: '8080'
- name: ENFORCE_HTTPS
value: 'false'
- name: NEXUS_DOCKER_HOST
- name: NEXUS_HTTP_HOST
- name: UPSTREAM_DOCKER_PORT
value: '5003'
- name: UPSTREAM_HTTP_PORT
value: '8081'
- name: UPSTREAM_HOST
value: localhost
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
serviceAccountName: nexus-sonatype-nexus
serviceAccount: nexus-sonatype-nexus
securityContext:
fsGroup: 2000
schedulerName: default-scheduler
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 25%
revisionHistoryLimit: 10
progressDeadlineSeconds: 600
status:
observedGeneration: 3
replicas: 1
updatedReplicas: 1
readyReplicas: 1
availableReplicas: 1
conditions:
- type: Progressing
status: 'True'
lastUpdateTime: '2020-08-18T16:23:54Z'
lastTransitionTime: '2020-08-18T16:23:54Z'
reason: NewReplicaSetAvailable
message: >-
ReplicaSet "nexus-sonatype-nexus-79fd4488d5" has successfully
progressed.
- type: Available
status: 'True'
lastUpdateTime: '2020-08-18T16:26:34Z'
lastTransitionTime: '2020-08-18T16:26:34Z'
reason: MinimumReplicasAvailable
message: Deployment has minimum availability.
why the domian could not access nexus by default? and what should I do to access nexus by domain?
From the documentation you should set a property of the helm chart: nexusProxy.env.nexusHttpHost to nexus.dolphin.com
The docker image used here has a proxy that allows you to access the Nexus HTTP and Nexus Docker services by different domains, if you don't specify either then you get the behaviour you're seeing.

Helm chart upgrade with different Docker image tag

I have a question about a Helm upgrade. I'm working on a helm chart which deploys a pod with a docker image tag:2.190.1-alpine.
Release name is jenkins.
I want change the docker image in the helm chart to 2.240
I launch:
helm upgrade jenkins codecentric/jenkins --set image.tag=2.2
But I got error:
Error: UPGRADE FAILED: unable to decode "": resource.metadataOnlyObject.ObjectMeta: v1.ObjectMeta.Labels: ReadString: expects " or n, but found 2, error found in #10 byte of ...|version":2.240,"helm.|..., bigger context ...|s.io/name":"jenkins","app.kubernetes.io/version":2.240,"helm.sh/chart":"jenkins-1.7.0"},"name":"myReleases|...
Has anyone had experience with this?
Here is a deployment file:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
creationTimestamp: "2020-02-06T22:39:51Z"
generation: 1
labels:
app.kubernetes.io/component: master
app.kubernetes.io/instance: jenkins
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: jenkins
helm.sh/chart: jenkins-1.5.1
name: jenkins
namespace: default
resourceVersion: "28697012"
selfLink: /apis/apps/v1/namespaces/default/deployments/jenkins
uid: 9934dea3-b3a1-4109-b725-da5410aacc5f
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/component: master
app.kubernetes.io/instance: jenkins
app.kubernetes.io/name: jenkins
strategy:
type: Recreate
template:
metadata:
annotations:
checksum/init: e43addd28e27d8ab36052943e5586515d252a27c3ddd85e15c43e99501ab1c0b
checksum/ref: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
creationTimestamp: null
labels:
app.kubernetes.io/component: master
app.kubernetes.io/instance: jenkins
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: jenkins
helm.sh/chart: jenkins-1.5.1
spec:
containers:
- env:
- name: JENKINS_SLAVE_AGENT_PORT
value: "50000"
- name: JAVA_OPTS
value: -Dhudson.slaves.NodeProvisioner.initialDelay=0 -Dhudson.model.LoadStatistics.decay=0.7
-Dhudson.slaves.NodeProvisioner.MARGIN=30 -Dhudson.slaves.NodeProvisioner.MARGIN0=0.6
-XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=2
-XshowSettings:vm
image: jenkins/jenkins:2.190.1-alpine
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /login
port: http
scheme: HTTP
initialDelaySeconds: 90
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: jenkins-master
ports:
- containerPort: 8080
name: http
protocol: TCP
- containerPort: 50000
name: agent
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /login
port: http
scheme: HTTP
initialDelaySeconds: 15
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/jenkins_home
name: jenkins-home
- mountPath: /usr/share/jenkins/ref/plugins
name: jenkins-plugins
dnsPolicy: ClusterFirst
initContainers:
- args:
- jenkins/jenkins
- 2.190.1-alpine
- "false"
command:
- /init/init.sh
image: jenkins/jenkins:2.190.1-alpine
imagePullPolicy: IfNotPresent
name: jenkins-init
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /init
name: init
- mountPath: /var/jenkins_home
name: jenkins-home
- mountPath: /usr/share/jenkins/ref/plugins
name: jenkins-plugins
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 1000
runAsNonRoot: true
runAsUser: 1000
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
volumes:
- name: jenkins-home
persistentVolumeClaim:
claimName: jenkins-pvc
- emptyDir: {}
name: jenkins-plugins
- configMap:
defaultMode: 365
name: jenkins-init
name: init
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2020-02-06T22:39:51Z"
lastUpdateTime: "2020-02-06T22:40:32Z"
message: ReplicaSet "jenkins-866f548f55" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
- lastTransitionTime: "2020-06-05T12:13:56Z"
lastUpdateTime: "2020-06-05T12:13:56Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
observedGeneration: 1
readyReplicas: 1
replicas: 1
updatedReplicas: 1
Is it safe to edit deployment file directly?
I noticed that the ERROR code listed version chart 1.7.0 even though mine is 1.5.1. Maybe I should state in the upgrade command that I stay with the old version?

How can I give grafana user appropriate permission so that it can start successfully?

env:
kubernetes provider: gke
kubernetes version: v1.13.12-gke.25
grafana version: 6.6.2 (official image)
grafana deployment manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
name: grafana
labels:
app: grafana
spec:
containers:
- name: grafana
image: grafana/grafana:6.6.2
ports:
- name: grafana
containerPort: 3000
# securityContext:
# runAsUser: 104
# allowPrivilegeEscalation: true
resources:
limits:
memory: "1Gi"
cpu: "500m"
requests:
memory: "500Mi"
cpu: "100m"
volumeMounts:
- mountPath: /var/lib/grafana
name: grafana-storage
volumes:
- name: grafana-storage
persistentVolumeClaim:
claimName: grafana-pvc
Problem
when I deployed this grafana dashboard first time, its working fine. after sometime I restarted the pod to check whether volume mount is working or not. after restarting, I getting below error.
mkdir: can't create directory '/var/lib/grafana/plugins': Permission denied
GF_PATHS_DATA='/var/lib/grafana' is not writable.
You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migration-from-a-previous-version-of-the-docker-container-to-5-1-or-later
what I understand from this error, user could create these files. How can I give this user appropriate permission to start grafana successfully?
I recreated your deployment with appropriate PVC and noticed that grafana pod was failing.
Output of command: $ kubectl get pods -n monitoring
NAME READY STATUS RESTARTS AGE
grafana-6466cd95b5-4g95f 0/1 Error 2 65s
Further investigation pointed the same errors as yours:
mkdir: can't create directory '/var/lib/grafana/plugins': Permission denied
GF_PATHS_DATA='/var/lib/grafana' is not writable.
You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migration-from-a-previous-version-of-the-docker-container-to-5-1-or-later
This error showed on first creation of a pod and the deployment. There was no need to recreate any pods.
What I did to make it work was to edit your deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
name: grafana
labels:
app: grafana
spec:
securityContext:
runAsUser: 472
fsGroup: 472
containers:
- name: grafana
image: grafana/grafana:6.6.2
ports:
- name: grafana
containerPort: 3000
resources:
limits:
memory: "1Gi"
cpu: "500m"
requests:
memory: "500Mi"
cpu: "100m"
volumeMounts:
- mountPath: /var/lib/grafana
name: grafana-storage
volumes:
- name: grafana-storage
persistentVolumeClaim:
claimName: grafana-pvc
Please take a specific look on part:
securityContext:
runAsUser: 472
fsGroup: 472
It is a setting described in official documentation: Kubernetes.io: set the security context for a pod
Please take a look on this Github issue which is similar to yours and pointed me to solution that allowed pod to spawn correctly:
https://github.com/grafana/grafana-docker/issues/167
Grafana had some major updates starting from version 5.1. Please take a look: Grafana.com: Docs: Migrate to v5.1 or later
Please let me know if this helps.
On v8.0, I do that setting runAsUser: 0.
It works.
---
apiVersion: v1
kind: Service
metadata:
name: grafana
spec:
ports:
- name: grafana-tcp
port: 3000
protocol: TCP
targetPort: 3000
selector:
project: grafana
type: LoadBalancer
status:
loadBalancer: {}
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
project: grafana
name: grafana
spec:
replicas: 1
selector:
matchLabels:
project: grafana
strategy:
type: RollingUpdate
template:
metadata:
labels:
project: grafana
name: grafana
spec:
securityContext:
runAsUser: 0
containers:
- image: grafana/grafana
name: grafana
ports:
- containerPort: 3000
protocol: TCP
resources: {}
volumeMounts:
- mountPath: /var/lib/grafana
name: grafana-volume
volumes:
- name: grafana-volume
hostPath:
# directory location on host
path: /opt/grafana
# this field is optional
type: DirectoryOrCreate
restartPolicy: Always
status: {}

Calico: networkPlugin cni failed to set up pod, i/o timeout

I have got an issue with deploy some pods on my k8s node. The error is following:
Failed create pod sandbox: rpc error: code = Unknown desc = failed to
set up sandbox container
"7da8bce09dd6820a65754073b1b4e52e640291dcb82f1da87ae99570c6964d1b"
network for pod "webservices-8675d4667d-7mdf9": networkPlugin cni
failed to set up pod "webservices-8675d4667d-7mdf9_default" network:
Get https://[10.233.0.1]:443/api/v1/namespaces/default: dial tcp
10.233.0.1:443: i/o timeout
However, some pods are deployed, for example kubernetes-dashboard:
Update:
NAME STATUS ROLES AGE VERSION LABELS
k8s-master.mariyo.eu Ready master 3d15h v1.16.6 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master.mariyo.eu,kubernetes.io/os=linux,node-role.kubernetes.io/master=
k8s-node-1.mariyo.eu Ready <none> 3d15h v1.16.6 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-1.mariyo.eu,kubernetes.io/os=linux
Deployment for coredns:
kind: Deployment
apiVersion: apps/v1
metadata:
name: coredns
namespace: kube-system
selfLink: /apis/apps/v1/namespaces/kube-system/deployments/coredns
uid: bd5451ec-2a33-443d-8519-ffcec935ac0c
resourceVersion: '397508'
generation: 2
creationTimestamp: '2020-01-24T16:14:37Z'
labels:
addonmanager.kubernetes.io/mode: Reconcile
k8s-app: kube-dns
kubernetes.io/cluster-service: 'true'
kubernetes.io/name: coredns
annotations:
deployment.kubernetes.io/revision: '1'
kubectl.kubernetes.io/last-applied-configuration: >
{"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"addonmanager.kubernetes.io/mode":"Reconcile","k8s-app":"kube-dns","kubernetes.io/cluster-service":"true","kubernetes.io/name":"coredns"},"name":"coredns","namespace":"kube-system"},"spec":{"selector":{"matchLabels":{"k8s-app":"kube-dns"}},"strategy":{"rollingUpdate":{"maxSurge":"10%","maxUnavailable":0},"type":"RollingUpdate"},"template":{"metadata":{"annotations":{"seccomp.security.alpha.kubernetes.io/pod":"docker/default"},"labels":{"k8s-app":"kube-dns"}},"spec":{"affinity":{"nodeAffinity":{"preferredDuringSchedulingIgnoredDuringExecution":[{"preference":{"matchExpressions":[{"key":"node-role.kubernetes.io/master","operator":"In","values":[""]}]},"weight":100}]},"podAntiAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":[{"labelSelector":{"matchLabels":{"k8s-app":"kube-dns"}},"topologyKey":"kubernetes.io/hostname"}]}},"containers":[{"args":["-conf","/etc/coredns/Corefile"],"image":"docker.io/coredns/coredns:1.6.0","imagePullPolicy":"IfNotPresent","livenessProbe":{"failureThreshold":10,"httpGet":{"path":"/health","port":8080,"scheme":"HTTP"},"successThreshold":1,"timeoutSeconds":5},"name":"coredns","ports":[{"containerPort":53,"name":"dns","protocol":"UDP"},{"containerPort":53,"name":"dns-tcp","protocol":"TCP"},{"containerPort":9153,"name":"metrics","protocol":"TCP"}],"readinessProbe":{"failureThreshold":10,"httpGet":{"path":"/ready","port":8181,"scheme":"HTTP"},"successThreshold":1,"timeoutSeconds":5},"resources":{"limits":{"memory":"170Mi"},"requests":{"cpu":"100m","memory":"70Mi"}},"securityContext":{"allowPrivilegeEscalation":false,"capabilities":{"add":["NET_BIND_SERVICE"],"drop":["all"]},"readOnlyRootFilesystem":true},"volumeMounts":[{"mountPath":"/etc/coredns","name":"config-volume"}]}],"dnsPolicy":"Default","nodeSelector":{"beta.kubernetes.io/os":"linux"},"priorityClassName":"system-cluster-critical","serviceAccountName":"coredns","tolerations":[{"effect":"NoSchedule","key":"node-role.kubernetes.io/master"},{"key":"CriticalAddonsOnly","operator":"Exists"}],"volumes":[{"configMap":{"items":[{"key":"Corefile","path":"Corefile"}],"name":"coredns"},"name":"config-volume"}]}}}}
spec:
replicas: 2
selector:
matchLabels:
k8s-app: kube-dns
template:
metadata:
creationTimestamp: null
labels:
k8s-app: kube-dns
annotations:
seccomp.security.alpha.kubernetes.io/pod: docker/default
spec:
volumes:
- name: config-volume
configMap:
name: coredns
items:
- key: Corefile
path: Corefile
defaultMode: 420
containers:
- name: coredns
image: 'docker.io/coredns/coredns:1.6.0'
args:
- '-conf'
- /etc/coredns/Corefile
ports:
- name: dns
containerPort: 53
protocol: UDP
- name: dns-tcp
containerPort: 53
protocol: TCP
- name: metrics
containerPort: 9153
protocol: TCP
resources:
limits:
memory: 170Mi
requests:
cpu: 100m
memory: 70Mi
volumeMounts:
- name: config-volume
mountPath: /etc/coredns
livenessProbe:
httpGet:
path: /health
port: 8080
scheme: HTTP
timeoutSeconds: 5
periodSeconds: 10
successThreshold: 1
failureThreshold: 10
readinessProbe:
httpGet:
path: /ready
port: 8181
scheme: HTTP
timeoutSeconds: 5
periodSeconds: 10
successThreshold: 1
failureThreshold: 10
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
securityContext:
capabilities:
add:
- NET_BIND_SERVICE
drop:
- all
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: Default
nodeSelector:
beta.kubernetes.io/os: linux
serviceAccountName: coredns
serviceAccount: coredns
securityContext: {}
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: node-role.kubernetes.io/master
operator: In
values:
- ''
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
k8s-app: kube-dns
topologyKey: kubernetes.io/hostname
schedulerName: default-scheduler
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
- key: CriticalAddonsOnly
operator: Exists
priorityClassName: system-cluster-critical
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 10%
revisionHistoryLimit: 10
progressDeadlineSeconds: 600
status:
observedGeneration: 2
replicas: 2
updatedReplicas: 2
readyReplicas: 1
availableReplicas: 1
unavailableReplicas: 1
conditions:
- type: Progressing
status: 'True'
lastUpdateTime: '2020-01-24T16:14:42Z'
lastTransitionTime: '2020-01-24T16:14:37Z'
reason: NewReplicaSetAvailable
message: ReplicaSet "coredns-58687784f9" has successfully progressed.
- type: Available
status: 'False'
lastUpdateTime: '2020-01-27T17:42:57Z'
lastTransitionTime: '2020-01-27T17:42:57Z'
reason: MinimumReplicasUnavailable
message: Deployment does not have minimum availability.
Deployment for webservices:
kind: Deployment
apiVersion: apps/v1
metadata:
name: webservices
namespace: default
selfLink: /apis/apps/v1/namespaces/default/deployments/webservices
uid: da75d3d8-92f4-4d06-86d6-e2fb325806a5
resourceVersion: '398529'
generation: 1
creationTimestamp: '2020-01-27T08:05:16Z'
labels:
run: webservices
annotations:
deployment.kubernetes.io/revision: '1'
spec:
replicas: 5
selector:
matchLabels:
run: webservices
template:
metadata:
creationTimestamp: null
labels:
run: webservices
spec:
containers:
- name: webservices
image: nginx
ports:
- containerPort: 80
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: Always
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
securityContext: {}
schedulerName: default-scheduler
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 25%
revisionHistoryLimit: 10
progressDeadlineSeconds: 600
status:
observedGeneration: 1
replicas: 5
updatedReplicas: 5
unavailableReplicas: 5
conditions:
- type: Available
status: 'False'
lastUpdateTime: '2020-01-27T08:05:16Z'
lastTransitionTime: '2020-01-27T08:05:16Z'
reason: MinimumReplicasUnavailable
message: Deployment does not have minimum availability.
- type: Progressing
status: 'False'
lastUpdateTime: '2020-01-27T17:52:58Z'
lastTransitionTime: '2020-01-27T17:52:58Z'
reason: ProgressDeadlineExceeded
message: ReplicaSet "webservices-8675d4667d" has timed out progressing.
Finally, I decided to reinstall nodes from Debian 10 to Ubuntu 18.04 and everything works as expected.
Thank you for your time
Problem is that kube-proxy isn't functioning correctly as I believe the 10.233.0.1 is the kubernetes api service address which it is responsible for configuring/setting up. You should check kube-proxy logs and see that it is healthy and create the iptables rules for the kubernetes services.
Take a look here: calico-timeout-pod.
I had to set the following on the worker node as well, before joining it, for it to work:
sudo sysctl net.bridge.bridge-nf-call-iptables=1
I was having a similar issue. I am using microk8s in my instance. it seems the node needs to advertise itself to the cluster. I hope it points you in the right direction (repost from github):
microk8s stop
# or for workers: sudo snap stop microk8s
sudo vim.tiny /var/snap/microk8s/current/args/kubelet
# Add this to bottom: --node-ip=<this-specific-node-lan-ip>
sudo vim.tiny /var/snap/microk8s/current/args/kube-apiserver
# Add this to bottom: --advertise-address=<this-specific-node-lan-ip>
microk8s start
# or for workers: sudo snap start microk8s

The l7-default-backend deployment gets reverted when I edit it

I upgraded my GKE clusters to Kubernetes 1.5.6 a couple days ago. I used to be able to scale the l7-default-backend deployment to 3 replicas and increase the CPU resource limits, but now it seems my changes get reverted to their default of 1 replica and 10m CPU limit.
Interestingly enough, when I added the following:
nodeSelector:
cloud.google.com/gke-nodepool: default-pool
it was persisted no problem.
Here's the current deployment manifest:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "4"
kubectl.kubernetes.io/last-applied-configuration: '{"kind":"Deployment","apiVersion":"extensions/v1beta1","metadata":{"name":"l7-default-backend","namespace":"kube-system","creationTimestamp":null,"labels":{"k8s-app":"glbc","kubernetes.io/cluster-service":"true","kubernetes.io/name":"GLBC"}},"spec":{"replicas":1,"selector":{"matchLabels":{"k8s-app":"glbc"}},"template":{"metadata":{"creationTimestamp":null,"labels":{"k8s-app":"glbc","name":"glbc"}},"spec":{"containers":[{"name":"default-http-backend","image":"gcr.io/google_containers/defaultbackend:1.0","ports":[{"containerPort":8080}],"resources":{"limits":{"cpu":"10m","memory":"20Mi"},"requests":{"cpu":"10m","memory":"20Mi"}},"livenessProbe":{"httpGet":{"path":"/healthz","port":8080,"scheme":"HTTP"},"initialDelaySeconds":30,"timeoutSeconds":5}}]}},"strategy":{}},"status":{}}'
creationTimestamp: 2017-03-23T23:30:12Z
generation: 9
labels:
k8s-app: glbc
kubernetes.io/cluster-service: "true"
kubernetes.io/name: GLBC
name: l7-default-backend
namespace: kube-system
resourceVersion: "40149922"
selfLink: /apis/extensions/v1beta1/namespaces/kube-system/deployments/l7-default-backend
uid: a9772d26-1020-11e7-b9a8-42010af001d0
spec:
replicas: 1
selector:
matchLabels:
k8s-app: glbc
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
k8s-app: glbc
name: glbc
spec:
containers:
- image: gcr.io/google_containers/defaultbackend:1.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: default-http-backend
ports:
- containerPort: 8080
protocol: TCP
resources:
limits:
cpu: 10m
memory: 20Mi
requests:
cpu: 10m
memory: 20Mi
terminationMessagePath: /dev/termination-log
dnsPolicy: ClusterFirst
nodeSelector:
cloud.google.com/gke-nodepool: default-pool
restartPolicy: Always
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
conditions:
- lastTransitionTime: 2017-04-13T19:19:35Z
lastUpdateTime: 2017-04-13T19:19:35Z
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
observedGeneration: 9
replicas: 1
updatedReplicas: 1
How can I scale the l7-default-backend successfully?