kubernetes autoscale having issues with heapster - kubernetes

I have heapster installed on kubernetes, i am trying to autoscale my pods.. but i keep seeing the following:
unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
The heapster service itself
I1009 14:22:21.014890 1 heapster.go:73] Heapster version v1.4.2
I1009 14:22:21.015226 1 configs.go:61] Using Kubernetes client with master "https://kubernetes.default" and version v1
I1009 14:22:21.015244 1 configs.go:62] Using kubelet port 10250
I1009 14:22:21.030070 1 heapster.go:196] Starting with Metric Sink
I1009 14:22:21.042806 1 heapster.go:106] Starting heapster on port 8082
E1009 14:30:05.000311 1 kubelet.go:280] Node ip-xxxxxx.eu-west-1.compute.internal is not ready
E1009 14:30:05.000342 1 kubelet.go:280] Node ip-xxxxxx.eu-west-1.compute.internal is not ready
E1009 14:30:05.000351 1 kubelet.go:280] Node ip-xxxxxx.eu-west-1.compute.internal is not ready
E1009 14:30:05.000357 1 kubelet.go:280] Node ip-xxxxxx.eu-west-1.compute.internal is not ready
E1009 14:30:05.000363 1 kubelet.go:280] Node ip-xxxxxx.eu-west-1.compute.internal is not ready
E1009 14:30:05.000370 1 kubelet.go:280] Node ip-xxxxxx.eu-west-1.compute.internal is not ready
I want to have autoscaling working with kubernetes, anyone have any ideas?
I know heapster is deprecated, but as of now i cannot chnage or upgrade to metrics server.. so could do with some help

I have once seen similar error. The problem was RBAC issue but the error log was misleading. Make sure you have provided get permission for pods.metrics.k8s.io resource.
Check this similar question: Kubernetes Custom CRD: “Failed to list …: the server could not find the requested resource”
Note: Fetching metrics from Heapster is deprecated as of Kubernetes 1.11. Ref: Horizontal Pod Autoscaler

Related

Metrics-Server: Node had no addresses that matched types [InternalIP]

I'm using Rancher 2.5.8 to manage my Kubernetes clusters. Today, I created a new cluster and everything worked as expected, except the metrics-server. The status of the metrics-server is always "CrashLoopBackOff" and the logs are telling me the following:
E0519 11:46:39.225804 1 server.go:132] unable to fully scrape metrics: [unable to fully scrape metrics from node worker1: unable to fetch metrics from node worker1: unable to extract connection information for node "worker1": node worker1 had no addresses that matched types [InternalIP], unable to fully scrape metrics from node worker2: unable to fetch metrics from node worker2: unable to extract connection information for node "worker2": node worker2 had no addresses that matched types [InternalIP], unable to fully scrape metrics from node worker3: unable to fetch metrics from node worker3: unable to extract connection information for node "worker3": node worker3 had no addresses that matched types [InternalIP], unable to fully scrape metrics from node main1: unable to fetch metrics from node main1: unable to extract connection information for node "main1": node main1 had no addresses that matched types [InternalIP]]
I0519 11:46:39.228205 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0519 11:46:39.228222 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I0519 11:46:39.228290 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0519 11:46:39.228301 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0519 11:46:39.228310 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0519 11:46:39.228314 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0519 11:46:39.229241 1 secure_serving.go:197] Serving securely on [::]:4443
I0519 11:46:39.229280 1 dynamic_serving_content.go:130] Starting serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key
I0519 11:46:39.229302 1 tlsconfig.go:240] Starting DynamicServingCertificateController
I0519 11:46:39.328399 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0519 11:46:39.328428 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0519 11:46:39.328505 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController
Is anyone having any idea how I can solve the issue so that the metrics-server isn't crashing anymore?
Here's the output of the kubectl get nodes worker1 -oyaml:
status:
addresses:
- address: worker1
type: Hostname
- address: 65.21.<any>.<ip>
type: ExternalIP
The issue was with the metrics server.
Metrics server was configured to use kubelet-preferred-address-types=InternalIP but worker node didn't have any InternalIP listed:
$ kubectl get nodes worker1 -oyaml:
[...]
status:
addresses:
- address: worker1
type: Hostname
- address: 65.21.<any>.<ip>
type: ExternalIP
The solution was to set --kubelet-preferred-address-types=ExternalIP in metrics server deployment yaml.
But probably better solution would be to configure it as in official metrics server deployment yaml (source):
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
As states in metrics-server configuration docs:
--kubelet-preferred-address-types - The priority of node address types used when determining an address for connecting to a particular node (default [Hostname,InternalDNS,InternalIP,ExternalDNS,ExternalIP])

Istio Installation successful but not able to deploy POD

I have successfully installed Istio in k8 cluster.
Istio version is 1.9.1
Kubernetes CNI plugin used: Calico version 3.18 (Calico POD is up and running)
kubectl get pod -A
istio-system istio-egressgateway-bd477794-8rnr6 1/1 Running 0 124m
istio-system istio-ingressgateway-79df7c789f-fjwf8 1/1 Running 0 124m
istio-system istiod-6dc55bbdd-89mlv 1/1 Running 0 124
When I'm trying to deploy sample nginx app I am getting the error below:
failed calling webhook sidecar-injector.istio.io context deadline exceeded
Post "https://istiod.istio-system.svc:443/inject?timeout=30s":
context deadline exceeded
When I Disable automatic proxy sidecar injection the pod is getting deployed without any errors.
kubectl label namespace default istio-injection-
I am not sure how to fix this issue could you please some one help me on this?
In this case, adding hostNetwork:true under spec.template.spec to the istiod Deployment may help.
This seems to be a workaround when using Calico CNI for pod networking (see: failed calling webhook "sidecar-injector.istio.io)
As we can find in the Kubernetes Host namespaces documentation:
HostNetwork - Controls whether the pod may use the node network namespace. Doing so gives the pod access to the loopback device, services listening on localhost, and could be used to snoop on network activity of other pods on the same node.

HPA not able to fetch metrics from Prometheus in Kubernetes

I have a two node Kubernetes cluster i.e one master node and two worker nodes. For monitoring purpose, I have deployed Prometheus and Grafana. Now, I want to autoscale pods based on CPU usage. But even after configuring Grafana and Prometheus, I am getting the following error ---
Name: php-apache
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Mon, 17 Jun 2019 12:33:01 +0530
Reference: Deployment/php-apache
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): <unknown> / 50%
Min replicas: 1
Max replicas: 10
Deployment pods: 1 current / 0 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale
ScalingActive False FailedGetResourceMetric the HPA was unable to compute the replica count: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedGetResourceMetric 112s (x12408 over 2d4h) horizontal-pod-autoscaler unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
Can anybody let me know why Kubernetes is not fetching metrics from Prometheus ?
Kubernetes retrieves metrics from either the metrics.k8s.io API (normally implemented by the metrics-server which can be seperatly installed) or the custom.metrics.k8s.io API (which can be any type of metric and is normally provided by third parties). To use prometheus in HPA for kubernetes the Prometheus Adapter for the custom metrics API needs to be installed.
A walkthrough for the setup can be found here.
heapster is now depracted : https://github.com/kubernetes-retired/heapster
To enable auto-scaling on your cluster you can use HPA(horizontal pod auto-scaler) and you can also install metrics server to check all metrics.
To install metrics server on kubernetes you can follow this guide also :
amazon : https://docs.aws.amazon.com/eks/latest/userguide/metrics-server.html
https://github.com/kubernetes-incubator/metrics-server
https://medium.com/#cagri.ersen/kubernetes-metrics-server-installation-d93380de008
You don't need custom metrics to use HPA for auto-scaling pods based on their CPU usage.
As #Blokje5 mentioned earlier, you just need to install 'kube-state-metrics'.
The most convenient way to do it is with a dedicated helm chart (kube-state-metrics).
Hint: use override parameters with 'helm install' to create ServiceMonitor object for 'kube-state-metrics' Pod, to allow Prometheus to discover a new target for metrics scraping, e.g.:
helm install stable/kube-state-metrics --set prometheus.monitor.enabled=true
Remark: Pay attention to the 'serviceMonitorSelector' defined in your existing Prometheus resource object/configuration, so that it matches the ServiceMonitor definition for 'kube-state-metrics'. This is to make available Pods' metrics in Prometheus console.

Kubernetes Metrics unable to fetch pod/node metrics

I've installed metrics-server on kubernetes v1.11.2.
I'm running a bare-metal cluster using 3 nodes and 1 master
In the metrics-server log I have the following errors:
E0907 14:29:51.774592 1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:vps01: unable to
fetch metrics from Kubelet vps01 (vps01): Get https://vps01:10250/stats/summary/: dial tcp: lookup vps01 on 10.96.0.10:53: no such host, unable to fully scr
ape metrics from source kubelet_summary:vps04: unable to fetch metrics from Kubelet vps04 (vps04): Get https://vps04:10250/stats/summary/: dial tcp: lookup
vps04 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:vps03: unable to fetch metrics from Kubelet vps03 (vps03):
Get https://vps03:10250/stats/summary/: dial tcp: lookup vps03 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:vp
s02: unable to fetch metrics from Kubelet vps02 (vps02): Get https://vps02:10250/stats/summary/: dial tcp: lookup vps02 on 10.96.0.10:53: no such host]
E0907 14:30:01.694794 1 reststorage.go:98] unable to fetch pod metrics for pod boxweb/boxweb-deployment-7756c49688-fz625: no metrics known for pod "bo
xweb/boxweb-deployment-7756c49688-fz625"
E0907 14:30:10.517886 1 reststorage.go:112] unable to fetch node metrics for node "vps01": no metrics known for node "vps01"
I also can't get any metrics using
kubectl top node vps01
Same for autoscale it is not working
unable to get metrics for resource cpu: unable to fetch metrics from
resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
I found the following solution:
Change the metrics-server-deployment.yaml file and add:
command:
- /metrics-server
- --kubelet-preferred-address-types=InternalIP
- --kubelet-insecure-tls
It looks like you have DNS issue from your metrics-server pod. You can connect to the pod:
kubectl exec -it metrics-server-xxxxxxxxxx-xxxxx -n kube-system sh
/ # ping vps01
If you can't ping you can't resolve your node.
core-dns or kube-dns use the /etc/resolv.conf on each on your nodes too, so I would check if you can resolve the nodes between each other. Say, can you ping vps01 from vps02 or vps03, etc.
I got the same issue and I resolved by adding hostname in /etc/hosts on every node.
For collecting metric data (CPU/memory usage) metric-server try to access the nodes. However, the metric-server cannot resolve the hostname(vps01, vps02, vps03, and vps04) because the ones are not registered in DNS. As you mentioned, you cannot register the hostnames in DNS.
So, you must add the hostnames to /etc/hosts on the node where the metrics-server POD is running.
The autoscaler does not work since the metric-server is not working and there is no metric data.

Kubernetes monitoring service heapster keeps restarting

I am running a kubernetes cluster using azure's container engine. I have an issue with one of the kubernetes services, the one that does resource monitoring heapster. The pod is relaunched every minute or something like that. I have tried removing the heapster deployment, replicaset and pods, and recreate the deployment. It goes back the the same behaviour instantly.
When I look at the resources with the heapster label it looks a little bit weird:
$ kubectl get deploy,rs,po -l k8s-app=heapster --namespace=kube-system
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/heapster 1 1 1 1 17h
NAME DESIRED CURRENT READY AGE
rs/heapster-2708163903 1 1 1 17h
rs/heapster-867061013 0 0 0 17h
NAME READY STATUS RESTARTS AGE
po/heapster-2708163903-vvs1d 2/2 Running 0 0s
For some reason there is two replica sets. The one called rs/heapster-867061013 keeps reappearing even when I delete all of the resources and redeploy them. The above also shows that the pod just started, and this is the issue it keeps getting created then it runs for some seconds and a new one is created. I am new to running kubernetes so I am unsure which logfiles are relevant to this issue.
Logs from heapster container
heapster.go:72] /heapster source=kubernetes.summary_api:""
heapster.go:73] Heapster version v1.3.0
configs.go:61] Using Kubernetes client with master "https://10.0.0.1:443" and version v1
configs.go:62] Using kubelet port 10255
heapster.go:196] Starting with Metric Sink
heapster.go:106] Starting heapster on port 8082
Logs from heapster-nanny container
pod_nanny.go:56] Invoked by [/pod_nanny --cpu=80m --extra-cpu=0.5m --memory=140Mi --extra-memory=4Mi --threshold=5 --deployment=heapster --container=heapster --poll-period=300000 --estimator=exponential]
pod_nanny.go:68] Watching namespace: kube-system, pod: heapster-2708163903-mqlsq, container: heapster.
pod_nanny.go:69] cpu: 80m, extra_cpu: 0.5m, memory: 140Mi, extra_memory: 4Mi, storage: MISSING, extra_storage: 0Gi
pod_nanny.go:110] Resources: [{Base:{i:{value:80 scale:-3} d:{Dec:<nil>} s:80m Format:DecimalSI} ExtraPerNode:{i:{value:5 scale:-4} d:{Dec:<nil>} s: Format:DecimalSI} Name:cpu} {Base:{i:{value:146800640 scale:0} d:{Dec:<nil>} s:140Mi Format:BinarySI} ExtraPerNode:{i:{value:4194304 scale:0} d:{Dec:<nil>} s:4Mi Format:BinarySI} Name:memory}]
It is completely normal and important that the Deployment Controller keeps old ReplicaSet resources in order to do fast rollbacks.
A Deployment resource manages ReplicaSet resources. Your heapster Deployment is configured to run 1 pod - this means it will always try to create one ReplicaSet with 1 pod. In case you make an update to the Deployment (say, a new heapster version), then the Deployment resource creates a new ReplicaSet which will schedule a pod with the new version. At the same time, the old ReplicaSet resource sets its desired pods to 0, but the resource itself is still kept for easy rollbacks. As you can see, the old ReplicaSet rs/heapster-867061013 has 0 pods running. In case you make a rollback, the Deployment deploy/heapster will increase the number of pods in rs/heapster-867061013 to 1 and decrease the number in rs/heapster-2708163903 back to 0. You should also checkout the documentation about the Deployment Controller (in case you haven't done it yet).
Still, it seems odd to me why your newly created Deployment Controller would instantly create 2 ReplicaSets. Did you wait a few seconds (say, 20) after deleting the Deployment Controller and before creating a new one? For me it sometimes takes some time before deletions propagate throughout the whole cluster and if I recreate too quickly, then the same resource is reused.
Concerning the heapster pod recreation you mentioned: pods have a restartPolicy. If set to Never, the pod will be recreated by its ReplicaSet in case it exits (this means a new pod resource is created and the old one is being deleted). My guess is that your heapster pod has this Never policy set. It might exit due to some error and reach a Failed state (you need to check that with the logs). Then after a short while the ReplicaSet creates a new pod.
OK, so it happens to be a problem in the azure container service default kubernetes configuration. I got some help from an azure supporter.
The problem is fixed by adding the label addonmanager.kubernetes.io/mode: EnsureExists to the heapster deployment. Here is the pull request that the supporter referenced: https://github.com/Azure/acs-engine/pull/1133