I wanted to setup HPA for a deployment on my kubernetes cluster (1.14.0 on bare metal) so I followed the instructions to setup metrics-server here: https://github.com/kubernetes-sigs/metrics-server.
After deploying metrics-server, I am able issue commands like kubectl top nodes and deploy HPA's using kubectl autoscale deployment <deployment-name> --min=1 ...
Currently, the issue I am facing is the HPA's created from kubectl autoscale ... seem to be deleted automatically for some reason after around 4-5 mins. So, I feel like there is some important information/step I am missing related to HPA on kubernetes? But I couldn't find any further information related to this particular issue when searching online...
Related
A while back a GKE cluster got created which came with a daemonset of:
kubectl get daemonsets --all-namespaces
...
kube-system prometheus-to-sd 6 6 6 3 6 beta.kubernetes.io/os=linux 355d
Can I delete this daemonset without issue?
What is it being used for?
What functionality would I be losing without it?
TL;DR
Even if you delete it, it will be back.
A little bit more explanation
Citing explanation by user #Yasen what prometheus-to-sd is:
prometheus-to-sd is a simple component that can scrape metrics stored in prometheus text format from one or multiple components and push them to the Stackdriver. Main requirement: k8s cluster should run on GCE or GKE.
Github.com: Prometheus-to-sd
Assuming that the command deleting this daemonset will be:
$ kubectl delete daemonset prometheus-to-sd --namespace=kube-system
Executing this command will indeed delete the daemonset but it will be back after a while.
prometheus-to-sd daemonset is managed by Addon-Manager which will recreate deleted daemonset back to original state.
Below is the part of the prometheus-to-sd daemonset YAML definition which states that this daemonset is managed by addonmanager:
labels:
addonmanager.kubernetes.io/mode: Reconcile
You can read more about it by following: Github.com: Kubernetes: addon-manager
Deleting this daemonset is strictly connected to the monitoring/logging solution you are using with your GKE cluster. There are 2 options:
Stackdriver logging/monitoring
Legacy logging/monitoring
Stackdriver logging/monitoring
You need to completely disable logging and monitoring of your GKE cluster to delete this daemonset.
You can do it by following a path:
GCP -> Kubernetes Engine -> Cluster -> Edit -> Kubernetes Engine Monitoring -> Set to disabled.
Legacy logging/monitoring
If you are using a legacy solution which is available to GKE version 1.14, you need to disable the option of Legacy Stackdriver Monitoring by following the same path as above.
Let me know if you have any questions in that.
TL;DR - it's ok
Assuming your context, I suppose, it's ok to shutdown prometheus component of your cluster.
Except cases when reports, alerts and monitoring - are critical parts of your system.
Let dive in the sources of GCP
As per source code at GoogleCloudPlatform:
prometheus-to-sd is a simple component that can scrape metrics stored in prometheus text format from one or multiple components and push them to the Stackdriver. Main requirement: k8s cluster should run on GCE or GKE.
Prometheus
From their Prometheus Github Page:
The Prometheus monitoring system and time series database.
To get a picture what is it for - you can read awesome guide on Prometheus: Prometheus Monitoring : The Definitive Guide in 2019 – devconnected
Also, there are hundreds of videos on their Youtube channel Prometheus Monitoring
Your questions
So, answering to your questions:
Can I delete this daemonset without issue?
It depends. As I said, you can. Except cases when reports, alerts and monitoring - are critical parts of your system.
What is it being used for
It's a TSDB for monitoring
what functionality would I be loosing without it?
metrics
→ therefore dashboards
→ therefore alerting
I'm looking to update manually with the command kubectl autoscale my maximum number of replicas for auto scaling.
however each time I run the command it creates a new hpa that fails to launch the pod why I don't know at all:(
Do you have an idea how i can update manually with kubectl my HPA ?
https://gist.github.com/zyriuse75/e75a75dc447eeef9e8530f974b19c28a
I think you are mixing two topics here, one is manually scale a pod (you can do it through a deployment applying kubectl scale deploy {mydeploy} --replicas={#repl}). In the other hand you have HPA (Horizontal Pod AutoScaler), in order to do this (HPA) you should have configured any app metrics provider system
e.g:
metrics server
https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/metrics-server
heapster (deprecated) https://github.com/kubernetes-retired/heapster
then you can create a HPA to handle your autoscaling, you can get more info on this link https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/
Once created you can patch your HPA or deleted it and create it again
kubectl delete hpa hpa-pod -n ns-svc-cas
kubectl autoscale hpa-pod --min={#number} --max={#number} -n ns-svc-cas
easiest way
We're running kops->terraform k8s clusters in AWS, and because of the number of k8s jobs we have in flight at once, the kube-dns container is getting OOMkilled, forcing us to raise the memory limit.
As for automating this so it both survives cluster upgrades and is automatically done for new clusters created from the same template, I don't see a way to override the canned kops spec. The only options I can see involve some update (kubectl edit deployment kube-dns, delete the kube-dns add-on deployment and use our own, overwrite the spec uploaded to the kops state store, etc.) that probably needs to be done each time after using kops to update the cluster.
I've checked the docs and even the spec source and no other options stand out. Is there a way to pass a custom kube-dns deployment spec to kops? Or tell it not to install the kube-dns add-on?
I create cluster on Google Kubernetes Engine with Cluster Autoscaler option enabled.
I want to config the scaling behavior such as --scale-down-delay-after-delete according to https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md .
But I found no Pod or Deployment on kube-system which is cluster autoscaler.
Anyone has ideas?
Edit:
I am not saying Horizontal Pod Autoscaler.
And I hope I can configure it as like this :
$ gcloud container clusters update cluster-1 --enable-autoscaling --scan-interval=5 --scale-down-unneeded-time=3m
ERROR: (gcloud.container.clusters.update) unrecognized arguments:
--scan-interval=5
--scale-down-unneeded-time=3m
It is not possible according to https://github.com/kubernetes/autoscaler/issues/966
Probably because there is no way to access the executable (which it seems to be) on GKE.
You can't even view the logs of the autoscaler on GKE: https://github.com/kubernetes/autoscaler/issues/972
One way is to not enable the GKE autoscaler, and then manually install it on a worker node - per the project's docs:
Users can put it into kube-system namespace (Cluster Autoscaler doesn't scale down node with non-mirrored kube-system pods running on them) and set a priorityClassName: system-cluster-critical property on your pod spec (to prevent your pod from being evicted).
https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler#deployment
I would also think you could annotate the autoscaler pod(s) with the following:
"cluster-autoscaler.kubernetes.io/safe-to-evict": "false"
If i correclty understand you need this:
Check your deployments name by:
kubectl get deployments
And autoscale it by:
kubectl autoscale deployment your_deployment_name --cpu-percent=100 --min=1 --max=10
I have spun up a Kubernetes cluster in AWS using the official "kube-up" mechanism. By default, an addon that monitors the cluster and logs to InfluxDB is created. It has been noted in this post that InfluxDB quickly fills up disk space on nodes, and I am seeing this same issue.
The problem is, when I try to kill the InfluxDB replication controller and service, it "magically" comes back after a time. I do this:
kubectl delete rc --namespace=kube-system monitoring-influx-grafana-v1
kubectl delete service --namespace=kube-system monitoring-influxdb
kubectl delete service --namespace=kube-system monitoring-grafana
Then if I say:
kubectl get pods --namespace=kube-system
I do not see the pods running anymore. However after some amount of time (minutes to hours), the replication controllers, services, and pods are back. I don't know what is restarting them. I would like to kill them permanently.
You probably need to remove the manifest files for influxdb from the /etc/kubernetes/addons/ directory on your "master" host. Many of the kube-up.sh implementations use a service (usually at /etc/kubernetes/kube-master-addons.sh) that runs periodically and makes sure that all the manifests in /etc/kubernetes/addons/ are active.
You can also restart your cluster, but run export ENABLE_CLUSTER_MONITORING=none before running kube-up.sh. You can see other environment settings that impact the cluster kube-up.sh builds at cluster/aws/config-default.sh