Calico: networkPlugin cni failed to set up pod, i/o timeout

Calico: networkPlugin cni failed to set up pod, i/o timeout - kubernetes

I have got an issue with deploy some pods on my k8s node. The error is following:
Failed create pod sandbox: rpc error: code = Unknown desc = failed to
set up sandbox container
"7da8bce09dd6820a65754073b1b4e52e640291dcb82f1da87ae99570c6964d1b"
network for pod "webservices-8675d4667d-7mdf9": networkPlugin cni
failed to set up pod "webservices-8675d4667d-7mdf9_default" network:
Get https://[10.233.0.1]:443/api/v1/namespaces/default: dial tcp
10.233.0.1:443: i/o timeout
However, some pods are deployed, for example kubernetes-dashboard:
Update:
NAME STATUS ROLES AGE VERSION LABELS
k8s-master.mariyo.eu Ready master 3d15h v1.16.6 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master.mariyo.eu,kubernetes.io/os=linux,node-role.kubernetes.io/master=
k8s-node-1.mariyo.eu Ready <none> 3d15h v1.16.6 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-1.mariyo.eu,kubernetes.io/os=linux
Deployment for coredns:
kind: Deployment
apiVersion: apps/v1
metadata:
name: coredns
namespace: kube-system
selfLink: /apis/apps/v1/namespaces/kube-system/deployments/coredns
uid: bd5451ec-2a33-443d-8519-ffcec935ac0c
resourceVersion: '397508'
generation: 2
creationTimestamp: '2020-01-24T16:14:37Z'
labels:
addonmanager.kubernetes.io/mode: Reconcile
k8s-app: kube-dns
kubernetes.io/cluster-service: 'true'
kubernetes.io/name: coredns
annotations:
deployment.kubernetes.io/revision: '1'
kubectl.kubernetes.io/last-applied-configuration: >
{"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"addonmanager.kubernetes.io/mode":"Reconcile","k8s-app":"kube-dns","kubernetes.io/cluster-service":"true","kubernetes.io/name":"coredns"},"name":"coredns","namespace":"kube-system"},"spec":{"selector":{"matchLabels":{"k8s-app":"kube-dns"}},"strategy":{"rollingUpdate":{"maxSurge":"10%","maxUnavailable":0},"type":"RollingUpdate"},"template":{"metadata":{"annotations":{"seccomp.security.alpha.kubernetes.io/pod":"docker/default"},"labels":{"k8s-app":"kube-dns"}},"spec":{"affinity":{"nodeAffinity":{"preferredDuringSchedulingIgnoredDuringExecution":[{"preference":{"matchExpressions":[{"key":"node-role.kubernetes.io/master","operator":"In","values":[""]}]},"weight":100}]},"podAntiAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":[{"labelSelector":{"matchLabels":{"k8s-app":"kube-dns"}},"topologyKey":"kubernetes.io/hostname"}]}},"containers":[{"args":["-conf","/etc/coredns/Corefile"],"image":"docker.io/coredns/coredns:1.6.0","imagePullPolicy":"IfNotPresent","livenessProbe":{"failureThreshold":10,"httpGet":{"path":"/health","port":8080,"scheme":"HTTP"},"successThreshold":1,"timeoutSeconds":5},"name":"coredns","ports":[{"containerPort":53,"name":"dns","protocol":"UDP"},{"containerPort":53,"name":"dns-tcp","protocol":"TCP"},{"containerPort":9153,"name":"metrics","protocol":"TCP"}],"readinessProbe":{"failureThreshold":10,"httpGet":{"path":"/ready","port":8181,"scheme":"HTTP"},"successThreshold":1,"timeoutSeconds":5},"resources":{"limits":{"memory":"170Mi"},"requests":{"cpu":"100m","memory":"70Mi"}},"securityContext":{"allowPrivilegeEscalation":false,"capabilities":{"add":["NET_BIND_SERVICE"],"drop":["all"]},"readOnlyRootFilesystem":true},"volumeMounts":[{"mountPath":"/etc/coredns","name":"config-volume"}]}],"dnsPolicy":"Default","nodeSelector":{"beta.kubernetes.io/os":"linux"},"priorityClassName":"system-cluster-critical","serviceAccountName":"coredns","tolerations":[{"effect":"NoSchedule","key":"node-role.kubernetes.io/master"},{"key":"CriticalAddonsOnly","operator":"Exists"}],"volumes":[{"configMap":{"items":[{"key":"Corefile","path":"Corefile"}],"name":"coredns"},"name":"config-volume"}]}}}}
spec:
replicas: 2
selector:
matchLabels:
k8s-app: kube-dns
template:
metadata:
creationTimestamp: null
labels:
k8s-app: kube-dns
annotations:
seccomp.security.alpha.kubernetes.io/pod: docker/default
spec:
volumes:
- name: config-volume
configMap:
name: coredns
items:
- key: Corefile
path: Corefile
defaultMode: 420
containers:
- name: coredns
image: 'docker.io/coredns/coredns:1.6.0'
args:
- '-conf'
- /etc/coredns/Corefile
ports:
- name: dns
containerPort: 53
protocol: UDP
- name: dns-tcp
containerPort: 53
protocol: TCP
- name: metrics
containerPort: 9153
protocol: TCP
resources:
limits:
memory: 170Mi
requests:
cpu: 100m
memory: 70Mi
volumeMounts:
- name: config-volume
mountPath: /etc/coredns
livenessProbe:
httpGet:
path: /health
port: 8080
scheme: HTTP
timeoutSeconds: 5
periodSeconds: 10
successThreshold: 1
failureThreshold: 10
readinessProbe:
httpGet:
path: /ready
port: 8181
scheme: HTTP
timeoutSeconds: 5
periodSeconds: 10
successThreshold: 1
failureThreshold: 10
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
securityContext:
capabilities:
add:
- NET_BIND_SERVICE
drop:
- all
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: Default
nodeSelector:
beta.kubernetes.io/os: linux
serviceAccountName: coredns
serviceAccount: coredns
securityContext: {}
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: node-role.kubernetes.io/master
operator: In
values:
- ''
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
k8s-app: kube-dns
topologyKey: kubernetes.io/hostname
schedulerName: default-scheduler
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
- key: CriticalAddonsOnly
operator: Exists
priorityClassName: system-cluster-critical
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 10%
revisionHistoryLimit: 10
progressDeadlineSeconds: 600
status:
observedGeneration: 2
replicas: 2
updatedReplicas: 2
readyReplicas: 1
availableReplicas: 1
unavailableReplicas: 1
conditions:
- type: Progressing
status: 'True'
lastUpdateTime: '2020-01-24T16:14:42Z'
lastTransitionTime: '2020-01-24T16:14:37Z'
reason: NewReplicaSetAvailable
message: ReplicaSet "coredns-58687784f9" has successfully progressed.
- type: Available
status: 'False'
lastUpdateTime: '2020-01-27T17:42:57Z'
lastTransitionTime: '2020-01-27T17:42:57Z'
reason: MinimumReplicasUnavailable
message: Deployment does not have minimum availability.
Deployment for webservices:
kind: Deployment
apiVersion: apps/v1
metadata:
name: webservices
namespace: default
selfLink: /apis/apps/v1/namespaces/default/deployments/webservices
uid: da75d3d8-92f4-4d06-86d6-e2fb325806a5
resourceVersion: '398529'
generation: 1
creationTimestamp: '2020-01-27T08:05:16Z'
labels:
run: webservices
annotations:
deployment.kubernetes.io/revision: '1'
spec:
replicas: 5
selector:
matchLabels:
run: webservices
template:
metadata:
creationTimestamp: null
labels:
run: webservices
spec:
containers:
- name: webservices
image: nginx
ports:
- containerPort: 80
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: Always
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
securityContext: {}
schedulerName: default-scheduler
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 25%
revisionHistoryLimit: 10
progressDeadlineSeconds: 600
status:
observedGeneration: 1
replicas: 5
updatedReplicas: 5
unavailableReplicas: 5
conditions:
- type: Available
status: 'False'
lastUpdateTime: '2020-01-27T08:05:16Z'
lastTransitionTime: '2020-01-27T08:05:16Z'
reason: MinimumReplicasUnavailable
message: Deployment does not have minimum availability.
- type: Progressing
status: 'False'
lastUpdateTime: '2020-01-27T17:52:58Z'
lastTransitionTime: '2020-01-27T17:52:58Z'
reason: ProgressDeadlineExceeded
message: ReplicaSet "webservices-8675d4667d" has timed out progressing.

Finally, I decided to reinstall nodes from Debian 10 to Ubuntu 18.04 and everything works as expected.
Thank you for your time

Problem is that kube-proxy isn't functioning correctly as I believe the 10.233.0.1 is the kubernetes api service address which it is responsible for configuring/setting up. You should check kube-proxy logs and see that it is healthy and create the iptables rules for the kubernetes services.
Take a look here: calico-timeout-pod.

I had to set the following on the worker node as well, before joining it, for it to work:
sudo sysctl net.bridge.bridge-nf-call-iptables=1

I was having a similar issue. I am using microk8s in my instance. it seems the node needs to advertise itself to the cluster. I hope it points you in the right direction (repost from github):
microk8s stop
# or for workers: sudo snap stop microk8s
sudo vim.tiny /var/snap/microk8s/current/args/kubelet
# Add this to bottom: --node-ip=<this-specific-node-lan-ip>
sudo vim.tiny /var/snap/microk8s/current/args/kube-apiserver
# Add this to bottom: --advertise-address=<this-specific-node-lan-ip>
microk8s start
# or for workers: sudo snap start microk8s

Related

Kubernetes Service does not have active Endpoint

I created a Deployment, Service and an Ingress. Unfortunately, the ingress-nginx-controller pods are complaining that my Service does not have an Active Endpoint:
controller.go:920] Service "<namespace>/web-server" does not have any active Endpoint.
My Service definition:
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/should_be_scraped: "false"
creationTimestamp: "2021-06-22T07:07:18Z"
labels:
chart: <namespace>-core-1.9.2
release: <namespace>
name: web-server
namespace: <namespace>
resourceVersion: "9050796"
selfLink: /api/v1/namespaces/<namespace>/services/web-server
uid: 82b3c3b4-a181-4ba2-887a-a4498346bc81
spec:
clusterIP: 10.233.56.52
ports:
- name: http
port: 80
protocol: TCP
targetPort: 8080
selector:
app: web-server
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
My Deployment definition:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
creationTimestamp: "2021-06-22T07:07:19Z"
generation: 1
labels:
app: web-server
chart: <namespace>-core-1.9.2
release: <namespace>
name: web-server
namespace: <namespace>
resourceVersion: "9051062"
selfLink: /apis/apps/v1/namespaces/<namespace>/deployments/web-server
uid: fb085727-9e8a-4931-8067-fd4ed410b8ca
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: web-server
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: web-server
spec:
containers:
- env:
<removed environment variables>
image: <url>/<namespace>/web-server:1.10.1
imagePullPolicy: IfNotPresent
name: web-server
ports:
- containerPort: 8080
name: http
protocol: TCP
- containerPort: 8082
name: metrics
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /actuator/health
port: 8080
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
memory: 1Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /config
name: <namespace>-config
dnsPolicy: ClusterFirst
hostAliases:
- hostnames:
- <url>
ip: 10.0.1.178
imagePullSecrets:
- name: registry-pull-secret
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- configMap:
defaultMode: 420
name: <namespace>-config
name: <namespace>-config
status:
conditions:
- lastTransitionTime: "2021-06-22T07:07:19Z"
lastUpdateTime: "2021-06-22T07:07:19Z"
message: Deployment does not have minimum availability.
reason: MinimumReplicasUnavailable
status: "False"
type: Available
- lastTransitionTime: "2021-06-22T07:17:20Z"
lastUpdateTime: "2021-06-22T07:17:20Z"
message: ReplicaSet "web-server-6df6d6565b" has timed out progressing.
reason: ProgressDeadlineExceeded
status: "False"
type: Progressing
observedGeneration: 1
replicas: 1
unavailableReplicas: 1
updatedReplicas: 1
In the same namespace, I have more Service and Deployment resources, all of them work, except this one (+ another, see below).
# kubectl get endpoints -n <namespace>
NAME ENDPOINTS AGE
activemq 10.233.64.3:61613,10.233.64.3:8161,10.233.64.3:61616 + 1 more... 26d
content-backend 10.233.96.17:8080 26d
datastore3 10.233.96.16:8080 26d
web-server 74m
web-server-metrics 26d
As you can see, the selector/label are the same (web-server) in the Service as well as in the Deployment definition.

C-Nan has solved the problem, and has posted a solution as a comment:
I found the issue. The Pod was started, but not in Ready state due to a failing readinessProbe. I wasn't aware that an endpoint wouldn't be created until the Pod is in Ready state. Removing the readinessProbe created the Endpoint.

From the status of your Deployment, it seems that no pod is running for the Deployment.
status:
conditions:
- lastTransitionTime: "2021-06-22T07:07:19Z"
lastUpdateTime: "2021-06-22T07:07:19Z"
message: Deployment does not have minimum availability.
reason: MinimumReplicasUnavailable
status: "False"
type: Available
- lastTransitionTime: "2021-06-22T07:17:20Z"
lastUpdateTime: "2021-06-22T07:17:20Z"
message: ReplicaSet "web-server-6df6d6565b" has timed out progressing.
reason: ProgressDeadlineExceeded
status: "False"
type: Progressing
observedGeneration: 1
replicas: 1
unavailableReplicas: 1
updatedReplicas: 1
The unavailableReplicas: 1 filed is indicating that the desired pod is not available. As a result the Service has no active endpoint.
You can describe the deployment to see why the pod is unavailable.

Unable to add a K8s service as prometheus target

I want my prometheus server to scrape metrics from a pod.
I followed these steps:
Created a pod using deployment - kubectl apply -f sample-app.deploy.yaml
Exposed the same using kubectl apply -f sample-app.service.yaml
Deployed Prometheus server using helm upgrade -i prometheus prometheus-community/prometheus -f prometheus-values.yaml
created a serviceMonitor using kubectl apply -f service-monitor.yaml to add a target for prometheus.
All pods are running, but when I open prometheus dashboard, I don't see sample-app service as prometheus target, under status>targets in dashboard UI.
I've verified following:
I can see sample-app when I execute kubectl get servicemonitors
I can see sample-app exposes metrics in prometheus format under at /metrics
At this point I debugged further, entered into the prometheus pod using
kubectl exec -it pod/prometheus-server-65b759cb95-dxmkm -c prometheus-server sh
, and saw that proemetheus configuration (/etc/config/prometheus.yml) didn't have sample-app as one of the jobs so I edited the configmap using
kubectl edit cm prometheus-server -o yaml
Added
- job_name: sample-app
static_configs:
- targets:
- sample-app:8080
Assuming all other fields such as scraping interval, scrape_timeout stays default.
I can see the same has been reflected in /etc/config/prometheus.yml, but still prometheus dashboard doesn't show sample-app as targets under status>targets.
following are yamls for prometheus-server and service monitor.
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
autopilot.gke.io/resource-adjustment: '{"input":{"containers":[{"name":"prometheus-server-configmap-reload"},{"name":"prometheus-server"}]},"output":{"containers":[{"limits":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"requests":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"name":"prometheus-server-configmap-reload"},{"limits":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"requests":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"2Gi"},"name":"prometheus-server"}]},"modified":true}'
deployment.kubernetes.io/revision: "1"
meta.helm.sh/release-name: prometheus
meta.helm.sh/release-namespace: prom
creationTimestamp: "2021-06-24T10:42:31Z"
generation: 1
labels:
app: prometheus
app.kubernetes.io/managed-by: Helm
chart: prometheus-14.2.1
component: server
heritage: Helm
release: prometheus
name: prometheus-server
namespace: prom
resourceVersion: "6983855"
selfLink: /apis/apps/v1/namespaces/prom/deployments/prometheus-server
uid: <some-uid>
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: prometheus
component: server
release: prometheus
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: prometheus
chart: prometheus-14.2.1
component: server
heritage: Helm
release: prometheus
spec:
containers:
- args:
- --volume-dir=/etc/config
- --webhook-url=http://127.0.0.1:9090/-/reload
image: jimmidyson/configmap-reload:v0.5.0
imagePullPolicy: IfNotPresent
name: prometheus-server-configmap-reload
resources:
limits:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
requests:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
securityContext:
capabilities:
drop:
- NET_RAW
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/config
name: config-volume
readOnly: true
- args:
- --storage.tsdb.retention.time=15d
- --config.file=/etc/config/prometheus.yml
- --storage.tsdb.path=/data
- --web.console.libraries=/etc/prometheus/console_libraries
- --web.console.templates=/etc/prometheus/consoles
- --web.enable-lifecycle
image: quay.io/prometheus/prometheus:v2.26.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /-/healthy
port: 9090
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 15
successThreshold: 1
timeoutSeconds: 10
name: prometheus-server
ports:
- containerPort: 9090
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /-/ready
port: 9090
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 4
resources:
limits:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
requests:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
securityContext:
capabilities:
drop:
- NET_RAW
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/config
name: config-volume
- mountPath: /data
name: storage-volume
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 65534
runAsGroup: 65534
runAsNonRoot: true
runAsUser: 65534
seccompProfile:
type: RuntimeDefault
serviceAccount: prometheus-server
serviceAccountName: prometheus-server
terminationGracePeriodSeconds: 300
volumes:
- configMap:
defaultMode: 420
name: prometheus-server
name: config-volume
- name: storage-volume
persistentVolumeClaim:
claimName: prometheus-server
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2021-06-24T10:43:25Z"
lastUpdateTime: "2021-06-24T10:43:25Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2021-06-24T10:42:31Z"
lastUpdateTime: "2021-06-24T10:43:25Z"
message: ReplicaSet "prometheus-server-65b759cb95" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 1
readyReplicas: 1
replicas: 1
updatedReplicas: 1
yaml for service Monitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"monitoring.coreos.com/v1","kind":"ServiceMonitor","metadata":{"annotations":{},"creationTimestamp":"2021-06-24T07:55:58Z","generation":1,"labels":{"app":"sample-app","release":"prometheus"},"name":"sample-app","namespace":"prom","resourceVersion":"6884573","selfLink":"/apis/monitoring.coreos.com/v1/namespaces/prom/servicemonitors/sample-app","uid":"34644b62-eb4f-4ab1-b9df-b22811e40b4c"},"spec":{"endpoints":[{"port":"http"}],"selector":{"matchLabels":{"app":"sample-app","release":"prometheus"}}}}
creationTimestamp: "2021-06-24T07:55:58Z"
generation: 2
labels:
app: sample-app
release: prometheus
name: sample-app
namespace: prom
resourceVersion: "6904642"
selfLink: /apis/monitoring.coreos.com/v1/namespaces/prom/servicemonitors/sample-app
uid: <some-uid>
spec:
endpoints:
- port: http
selector:
matchLabels:
app: sample-app
release: prometheus

You need to use the prometheus-community/kube-prometheus-stack chart, which includes the Prometheus operator, in order to have Prometheus' configuration update automatically based on ServiceMonitor resources.
The prometheus-community/prometheus chart you used does not include the Prometheus operator that watches for ServiceMonitor resources in the Kubernetes API and updates the Prometheus server's ConfigMap accordingly.
It seems that you have the necessary CustomResourceDefinitions (CRDs) installed in your cluster, otherwise you would not have been able to create a ServiceMonitor resource. These are not included in the prometheus-community/prometheus chart so perhaps they were added to your cluster previously.

Why is GKE HPA not scaling down?

I have a Kubernetes deployment with a Go App in Kubernetes 1.17 on GKE. It has cpu and memory requests and limits. It has 1 replica specified in the deployment.
Furthermore I have this HPA (I have a autoscaling/v2beta2 defined in my Helm chart, but GKE converts it to a v2beta1 apparently):
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
annotations:
meta.helm.sh/release-name: servicename
meta.helm.sh/release-namespace: namespace
creationTimestamp: "2021-02-15T11:30:18Z"
labels:
app.kubernetes.io/managed-by: Helm
name: servicename-service
namespace: namespace
resourceVersion: "123"
selfLink: link
uid: uid
spec:
maxReplicas: 10
metrics:
- resource:
name: memory
targetAverageUtilization: 80
type: Resource
- resource:
name: cpu
targetAverageUtilization: 80
type: Resource
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: servicename-service
status:
conditions:
- lastTransitionTime: "2021-02-15T11:30:33Z"
message: recommended size matches current size
reason: ReadyForNewScale
status: "True"
type: AbleToScale
- lastTransitionTime: "2021-02-15T13:17:20Z"
message: the HPA was able to successfully calculate a replica count from cpu resource
utilization (percentage of request)
reason: ValidMetricFound
status: "True"
type: ScalingActive
- lastTransitionTime: "2021-02-15T13:17:36Z"
message: the desired count is within the acceptable range
reason: DesiredWithinRange
status: "False"
type: ScalingLimited
currentMetrics:
- resource:
currentAverageUtilization: 14
currentAverageValue: "9396224"
name: memory
type: Resource
- resource:
currentAverageUtilization: 33
currentAverageValue: 84m
name: cpu
type: Resource
currentReplicas: 3
desiredReplicas: 3
lastScaleTime: "2021-02-15T13:40:11Z"
Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "456"
meta.helm.sh/release-name: servicename-service
meta.helm.sh/release-namespace: services
creationTimestamp: "2021-02-11T10:00:45Z"
generation: 129
labels:
app: servicename
app.kubernetes.io/managed-by: Helm
chart: servicename
heritage: Helm
release: servicename-service
name: servicename-service-servicename
namespace: namespace
resourceVersion: "123"
selfLink: /apis/apps/v1/namespaces/namespace/deployments/servicename-service-servicename
uid: b1fcc8c6-f3e6-4bbf-92a1-d7ae1e2bb188
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: servicename
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: servicename
release: servicename-service
spec:
containers:
envFrom:
- configMapRef:
name: servicename-service-servicename
image: image
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /health/liveness
port: 8888
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: servicename
ports:
- containerPort: 8888
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /health/readiness
port: 8888
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: 500m
memory: 256Mi
requests:
cpu: 150m
memory: 64Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 3
conditions:
- lastTransitionTime: "2021-02-11T10:00:45Z"
lastUpdateTime: "2021-02-16T14:10:29Z"
message: ReplicaSet "servicename-service-servicename-5b6445fcb" has
successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
- lastTransitionTime: "2021-02-20T16:19:51Z"
lastUpdateTime: "2021-02-20T16:19:51Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
observedGeneration: 129
readyReplicas: 3
replicas: 3
updatedReplicas: 3
Output of kubectl get hpa --all-namespaces
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
namespace servicename-service Deployment/servicename-service 9%/80%, 1%/80% 1 10 2 6d
namespace xyz-service Deployment/xyz-service 18%/80%, 1%/80% 1 10 1 6d
I haven't changed any Kubernetes Controller default settings like --horizontal-pod-autoscaler-downscale-stabilization.
Question:
Why is it not scaling down to 1 replica when the currentAverageUtilization of the cpu is 33 and the target one 80? I waited for more than 1 hour.
Any ideas?

how to make traefik to bind host server's 80 and 443 port when using deployment type

I am using traefik 2.2.1 as my cluster's entrypoint using deployment type, this is my deployment config:
kind: Deployment
apiVersion: apps/v1
metadata:
name: traefik
namespace: kube-system
selfLink: /apis/apps/v1/namespaces/kube-system/deployments/traefik
uid: ddee327d-8570-44be-ab8d-06cb440187f4
resourceVersion: '335024'
generation: 12
creationTimestamp: '2020-06-04T07:37:20Z'
labels:
app.kubernetes.io/instance: traefik
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: traefik
helm.sh/chart: traefik-8.2.1
annotations:
deployment.kubernetes.io/revision: '7'
meta.helm.sh/release-name: traefik
meta.helm.sh/release-namespace: kube-system
spec:
replicas: 4
selector:
matchLabels:
app.kubernetes.io/instance: traefik
app.kubernetes.io/name: traefik
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/instance: traefik
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: traefik
helm.sh/chart: traefik-8.2.1
spec:
volumes:
- name: data
emptyDir: {}
containers:
- name: traefik
image: 'traefik:2.2.1'
args:
- '--global.checknewversion'
- '--global.sendanonymoususage'
- '--entryPoints.traefik.address=:9000'
- '--entryPoints.web.address=:80'
- '--entryPoints.websecure.address=:443'
- '--api.dashboard=true'
- '--ping=true'
- '--providers.kubernetescrd'
- '--providers.kubernetesingress'
ports:
- name: traefik
containerPort: 9000
protocol: TCP
- name: web
containerPort: 8000
protocol: TCP
- name: websecure
containerPort: 8443
protocol: TCP
resources: {}
volumeMounts:
- name: data
mountPath: /data
livenessProbe:
httpGet:
path: /ping
port: 9000
scheme: HTTP
initialDelaySeconds: 10
timeoutSeconds: 2
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
readinessProbe:
httpGet:
path: /ping
port: 9000
scheme: HTTP
initialDelaySeconds: 10
timeoutSeconds: 2
periodSeconds: 10
successThreshold: 1
failureThreshold: 1
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
securityContext:
capabilities:
drop:
- ALL
runAsUser: 65532
runAsGroup: 65532
runAsNonRoot: true
readOnlyRootFilesystem: true
restartPolicy: Always
terminationGracePeriodSeconds: 60
dnsPolicy: ClusterFirst
serviceAccountName: traefik
serviceAccount: traefik
securityContext:
fsGroup: 65532
schedulerName: default-scheduler
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
revisionHistoryLimit: 10
progressDeadlineSeconds: 600
status:
observedGeneration: 12
replicas: 5
updatedReplicas: 2
readyReplicas: 3
availableReplicas: 3
unavailableReplicas: 2
conditions:
- type: Available
status: 'True'
lastUpdateTime: '2020-06-04T08:41:03Z'
lastTransitionTime: '2020-06-04T08:41:03Z'
reason: MinimumReplicasAvailable
message: Deployment has minimum availability.
- type: Progressing
status: 'True'
lastUpdateTime: '2020-06-04T10:57:35Z'
lastTransitionTime: '2020-06-04T10:48:40Z'
reason: ReplicaSetUpdated
message: ReplicaSet "traefik-dd74b59b" is progressing.
my question is: is it possible to make the treafik listening host's 80 and 443 port? If possible, how to make it? or should I change my deployment type to daemon set? if not, I have to deployment a nginx in each node to forward traffic.

Add hostNetwork: true in the spec. This makes the pod use host's network namespace.
...
spec:
hostNetwork: true
containers:
- name: traefik
...

The l7-default-backend deployment gets reverted when I edit it

I upgraded my GKE clusters to Kubernetes 1.5.6 a couple days ago. I used to be able to scale the l7-default-backend deployment to 3 replicas and increase the CPU resource limits, but now it seems my changes get reverted to their default of 1 replica and 10m CPU limit.
Interestingly enough, when I added the following:
nodeSelector:
cloud.google.com/gke-nodepool: default-pool
it was persisted no problem.
Here's the current deployment manifest:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "4"
kubectl.kubernetes.io/last-applied-configuration: '{"kind":"Deployment","apiVersion":"extensions/v1beta1","metadata":{"name":"l7-default-backend","namespace":"kube-system","creationTimestamp":null,"labels":{"k8s-app":"glbc","kubernetes.io/cluster-service":"true","kubernetes.io/name":"GLBC"}},"spec":{"replicas":1,"selector":{"matchLabels":{"k8s-app":"glbc"}},"template":{"metadata":{"creationTimestamp":null,"labels":{"k8s-app":"glbc","name":"glbc"}},"spec":{"containers":[{"name":"default-http-backend","image":"gcr.io/google_containers/defaultbackend:1.0","ports":[{"containerPort":8080}],"resources":{"limits":{"cpu":"10m","memory":"20Mi"},"requests":{"cpu":"10m","memory":"20Mi"}},"livenessProbe":{"httpGet":{"path":"/healthz","port":8080,"scheme":"HTTP"},"initialDelaySeconds":30,"timeoutSeconds":5}}]}},"strategy":{}},"status":{}}'
creationTimestamp: 2017-03-23T23:30:12Z
generation: 9
labels:
k8s-app: glbc
kubernetes.io/cluster-service: "true"
kubernetes.io/name: GLBC
name: l7-default-backend
namespace: kube-system
resourceVersion: "40149922"
selfLink: /apis/extensions/v1beta1/namespaces/kube-system/deployments/l7-default-backend
uid: a9772d26-1020-11e7-b9a8-42010af001d0
spec:
replicas: 1
selector:
matchLabels:
k8s-app: glbc
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
k8s-app: glbc
name: glbc
spec:
containers:
- image: gcr.io/google_containers/defaultbackend:1.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: default-http-backend
ports:
- containerPort: 8080
protocol: TCP
resources:
limits:
cpu: 10m
memory: 20Mi
requests:
cpu: 10m
memory: 20Mi
terminationMessagePath: /dev/termination-log
dnsPolicy: ClusterFirst
nodeSelector:
cloud.google.com/gke-nodepool: default-pool
restartPolicy: Always
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
conditions:
- lastTransitionTime: 2017-04-13T19:19:35Z
lastUpdateTime: 2017-04-13T19:19:35Z
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
observedGeneration: 9
replicas: 1
updatedReplicas: 1
How can I scale the l7-default-backend successfully?

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Calico: networkPlugin cni failed to set up pod, i/o timeout - kubernetes

Finally, I decided to reinstall nodes from Debian 10 to Ubuntu 18.04 and everything works as expected. Thank you for your time

I had to set the following on the worker node as well, before joining it, for it to work: sudo sysctl net.bridge.bridge-nf-call-iptables=1

Related

Kubernetes Service does not have active Endpoint

Unable to add a K8s service as prometheus target

Why is GKE HPA not scaling down?

how to make traefik to bind host server's 80 and 443 port when using deployment type

The l7-default-backend deployment gets reverted when I edit it

Categories

Resources