Kubernetes, minikube and vpa: vpa doesn't scale up to target - kubernetes

before start, i'm running kubernetes on mac.
minikube: 1.17.0
metrics-server: 1.8+
vpa: vpa-release-0.8
my issue is vpa doesn't scale up my pod just keep recreating pods. i followed gke vpa example. i set resource requests of deployment cpu: 100m, memory: 50mi. and deploy vpa. it gave me recommendation. updatemode is Auto as well. but it keep recreating pod, doesn't change resource requests when i checked the pod what recreated by kubectl describe pod podname.
enter image description here
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: my-app-deployment
updatePolicy:
updateMode: "Auto"
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app-deployment
spec:
replicas: 2
selector:
matchLabels:
app: my-app-deployment
template:
metadata:
labels:
app: my-app-deployment
spec:
containers:
- name: my-container
image: k8s.gcr.io/ubuntu-slim:0.1
resources:
requests:
cpu: 100m
memory: 50Mi
command: ["/bin/sh"]
args: ["-c", "while true; do timeout 0.5s yes >/dev/null; sleep 0.5s; done"]
Status:
Conditions:
Last Transition Time: 2021-02-03T03:13:38Z
Status: True
Type: RecommendationProvided
Recommendation:
Container Recommendations:
Container Name: my-container
Lower Bound:
Cpu: 25m
Memory: 262144k
Target:
Cpu: 548m
Memory: 262144k
Uncapped Target:
Cpu: 548m
Memory: 262144k
Upper Bound:
Cpu: 100G
Memory: 100T
Events: <none>
i tried with kind as well. but it recreate pods with new resource request, but never run keep pending because the node's resource is not enough. and i think the reason why vpa doesn't work properly is minikube or me didn't make multiple node. you think is that relative?

Related

How often does kube-Scheduler refresh node resource data

I have a project to modify the scheduling policy, I have deployed a large number of pods at the same time, but it seems not scheduled as expected. I think kube-scheduler should cache the resource usage of nodes, so it needs to be deployed in two times.
Pod yaml is as follows, I run multiple pods through a shell loop implementation
apiVersion: v1
kind: Pod
metadata:
name: ${POD_NAME}
labels:
name: multischeduler-example
spec:
schedulerName: my-scheduler
containers:
- name: pod-with-second-annotation-container
image: ibmcom/pause:3.1
resources:
requests:
memory: "1Gi"
cpu: "1"
limits:
memory: "2Gi"
cpu: "2"
I want to know the interval of refreshing the cache of kube-scheduler for deployment
I really appreciate any help with this

Kubernetes HPA not working: unable to get metrics

My pod scaler fails to deploy, and keeps giving an error of FailedGetResourceMetric:
Warning FailedComputeMetricsReplicas 6s horizontal-pod-autoscaler failed to compute desired number of replicas based on listed metrics for Deployment/default/bot-deployment: invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
I have ensured to install metrics-server as you can see when I run the the following command to show the metrics-server resource on the cluster:
kubectl get deployment metrics-server -n kube-system
It shows this:
metrics-server
I also set the --kubelet-insecure-tls and --kubelet-preferred-address-types=InternalIP options in the args section of the metrics-server manifest file.
This is what my deployment manifest looks like:
apiVersion: apps/v1
kind: Deployment
metadata:
name: bot-deployment
labels:
app: bot
spec:
replicas: 1
selector:
matchLabels:
app: bot
template:
metadata:
labels:
app: bot
spec:
containers:
- name: bot-api
image: gcr.io/<repo>
ports:
- containerPort: 5600
volumeMounts:
- name: bot-volume
mountPath: /core
- name: wallet
image: gcr.io/<repo>
ports:
- containerPort: 5000
resources:
requests:
cpu: 800m
limits:
cpu: 1500m
volumeMounts:
- name: bot-volume
mountPath: /wallet_
volumes:
- name: bot-volume
emptyDir: {}
The specifications for my pod scaler is shown below too:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: bot-scaler
spec:
metrics:
- resource:
name: cpu
target:
averageUtilization: 85
type: Utilization
type: Resource
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: bot-deployment
minReplicas: 1
maxReplicas: 10
Because of this the TARGET options always remains as /80%. Upon introspection, the HPA makes that same complaint over and over again, I have tried all options, that I have seen on some other questions, but none of them seem to work. I have also tried uninstalling and reinstalling the metrics-server many times, but it doesn't work.
One thing I notice though, is that the metrics-server seems to shut down after I deploy the HPA manifest, and it fails to start. When i check the state of the metrics-server the READY option shows 0/1 even though it was initially 1/1. What could be wrong?
I will gladly provide as much info as needed. Thank you!
Looks like your bot-api is missing it's resource request and limit. your wallet has them though. the hpa uses all the resources in the pod to calculate the utilization

"Must specify limits.cpu" error during pod deployment even though cpu limit is specified

I am trying to run a test pod with OpenShift CLI:
$oc run nginx --image=nginx --limits=cpu=2,memory=4Gi
deploymentconfig.apps.openshift.io/nginx created
$oc describe deploymentconfig.apps.openshift.io/nginx
Name: nginx
Namespace: myproject
Created: 12 seconds ago
Labels: run=nginx
Annotations: <none>
Latest Version: 1
Selector: run=nginx
Replicas: 1
Triggers: Config
Strategy: Rolling
Template:
Pod Template:
Labels: run=nginx
Containers:
nginx:
Image: nginx
Port: <none>
Host Port: <none>
Limits:
cpu: 2
memory: 4Gi
Environment: <none>
Mounts: <none>
Volumes: <none>
Deployment #1 (latest):
Name: nginx-1
Created: 12 seconds ago
Status: New
Replicas: 0 current / 0 desired
Selector: deployment=nginx-1,deploymentconfig=nginx,run=nginx
Labels: openshift.io/deployment-config.name=nginx,run=nginx
Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal DeploymentCreated 12s deploymentconfig-controller Created new replication controller "nginx-1" for version 1
Warning FailedCreate 1s (x12 over 12s) deployer-controller Error creating deployer pod: pods "nginx-1-deploy" is forbidden: failed quota: quota-svc-myproject: must specify limits.cpu,limits.memory
I get "must specify limits.cpu,limits.memory" error, despite both limits being present in the same describe output.
What might be the problem and how do I fix it?
I found a solution!
Part of the error message was "Error creating deployer pod". It means that the problem is not with my pod, but with the deployer pod which performs my pod deployment.
It seems the quota in my project affects deployer pods as well.
I couldn't find a way to set deployer pod limits with CLI, so I've made a DeploymentConfig.
kind: "DeploymentConfig"
apiVersion: "v1"
metadata:
name: "test-app"
spec:
template:
metadata:
labels:
name: "test-app"
spec:
containers:
- name: "test-app"
image: "nginxinc/nginx-unprivileged"
resources:
limits:
cpu: "2000m"
memory: "20Gi"
ports:
- containerPort: 8080
protocol: "TCP"
replicas: 1
selector:
name: "test-app"
triggers:
- type: "ConfigChange"
- type: "ImageChange"
imageChangeParams:
automatic: true
containerNames:
- "test-app"
from:
kind: "ImageStreamTag"
name: "nginx-unprivileged:latest"
strategy:
type: "Rolling"
resources:
limits:
cpu: "2000m"
memory: "20Gi"
A you can see, two sets of limitations are specified here: for container and for deployment strategy.
With this configuration it worked fine!
Looks like you have specified resource quota and the values you specified for limits seems to be larger than that. Can you describe the resource quota oc describe quota quota-svc-myproject and adjust your configs accordingly.
A good reference could be https://docs.openshift.com/container-platform/3.11/dev_guide/compute_resources.html

Kubernetes HPA on AKS is failing with error 'missing request for cpu'

I am trying to setup HPA for my AKS cluster. Following is the Kubernetes manifest file:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
kompose.cmd: XXXXXX\tools\kompose.exe
convert
kompose.version: 1.21.0 (992df58d8)
creationTimestamp: null
labels:
io.kompose.service: loginservicedapr
name: loginservicedapr
spec:
replicas: 1
selector:
matchLabels:
io.kompose.service: loginservicedapr
strategy: {}
template:
metadata:
annotations:
kompose.cmd: XXXXXX\kompose.exe
convert
kompose.version: 1.21.0 (992df58d8)
creationTimestamp: null
labels:
io.kompose.service: loginservicedapr
spec:
containers:
image: XXXXXXX.azurecr.io/loginservicedapr:latest
imagePullPolicy: ""
name: loginservicedapr
resources:
requests:
cpu: 250m
limits:
cpu: 500m
ports:
- containerPort: 80
resources: {}
restartPolicy: Always
serviceAccountName: ""
volumes: null
status: {}
---
apiVersion: v1
kind: Service
metadata:
annotations:
kompose.cmd: XXXXXXXXXX\kompose.exe
convert
kompose.version: 1.21.0 (992df58d8)
creationTimestamp: null
labels:
io.kompose.service: loginservicedapr
name: loginservicedapr
spec:
type: LoadBalancer
ports:
- name: "5016"
port: 5016
targetPort: 80
selector:
io.kompose.service: loginservicedapr
status:
loadBalancer: {}
Following is my HPA yaml file:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: loginservicedapr-hpa
spec:
maxReplicas: 10 # define max replica count
minReplicas: 3 # define min replica count
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: loginservicedapr
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Pods
pods:
name: cpu
target:
type: Utilization
averageUtilization: 50
But when HPA is failing with the error 'FailedGetResourceMetric' - 'missing request for CPU'.
I have also installed metrics-server (though not sure whether that was required or not) using the following statement:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml
But still I am getting the following output when I do 'kubectl describe hpa':
Name: loginservicedapr-hpa
Namespace: default
Labels: fluxcd.io/sync-gc-mark=sha256.Y6dHhIOs-hNYbDmJ25Ijw1YsJ_8f0PH3Vlruj5rfbFk
Annotations: fluxcd.io/sync-checksum: d5c0d9eda6db0c40f1e5e23e1356d0268dbccc8f
kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"autoscaling/v1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{"fluxcd.io/sync-checksum":"d5c0d9eda6db0c40f1e5...
CreationTimestamp: Wed, 08 Jul 2020 17:19:47 +0530
Reference: Deployment/loginservicedapr
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): <unknown> / 50%
Min replicas: 3
Max replicas: 10
Deployment pods: 3 current / 3 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale
ScalingActive False FailedGetResourceMetric the HPA was unable to compute the replica count: missing request for cpu
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedComputeMetricsReplicas 33m (x1234 over 6h3m) horizontal-pod-autoscaler Invalid metrics (1 invalid out of 1), last error was: failed to get cpu utilization: missing request for cpu
Warning FailedGetResourceMetric 3m11s (x1340 over 6h3m) horizontal-pod-autoscaler missing request for cpu
I have 2 more services that I have deployed along with 'loginservicedapr'. But I have not written HPA for those services. But I have included resource limits for those services as well in their YAML files. How to make this HPA work?
resources appears twice in your pod spec.
resources: # once here
requests:
cpu: 250m
limits:
cpu: 500m
ports:
- containerPort: 80
resources: {} # another here, clearing it
I was able to resolve the issue by changing the following in my kubernetes manifest file from this:
resources:
requests:
cpu: 250m
limits:
cpu: 500m
to the following:
resources:
requests:
cpu: "250m"
limits:
cpu: "500m"
HPA worked after that. Following is the GitHub link which gave the solution:
https://github.com/kubernetes-sigs/metrics-server/issues/237
But I did not add any Internal IP address command or anything else.
This is typically related to the metrics server.
Make sure you are not seeing anything unusual about the metrics server installation:
# This should show you metrics (they come from the metrics server)
$ kubectl top pods
$ kubectl top nodes
or check the logs:
$ kubectl logs <metrics-server-pod>
Also, check your kube-controller-manager logs for HPA events related entries.
Furthermore, if you'd like to explore more on whether your pods have missing requests/limits you can simply see the full output of your running pod managed by the HPA:
$ kubectl get pod <pod-name> -o=yaml
Some other people have had luck deleting and renaming the HPA too.

How to propagate kubernetes events from a GKE cluster to google cloud log

Is there anyway to propagate all kubernetes events to google cloud log? For instance, a pod creation/deletion or liveness probing failed, I knew I can use kubectl get events in a console.However, I would like to preserve those events in a log file in the cloud log with other pod level logs. It is quite helpful information.
It seems that OP found the logs, but I wasn't able to on GKE (1.4.7) with Stackdriver. It was a little tricky to figure out, so I thought I'd share for others. I was able to get them by creating an eventer deployment with the gcl sink.
For example:
deployment.yaml
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
labels:
k8s-app: eventer
name: eventer
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
k8s-app: eventer
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
labels:
k8s-app: eventer
spec:
containers:
- name: eventer
command:
- /eventer
- --source=kubernetes:''
- --sink=gcl
image: gcr.io/google_containers/heapster:v1.2.0
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: 100m
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
terminationMessagePath: /dev/termination-log
restartPolicy: Always
terminationGracePeriodSeconds: 30
Then, search for logs with an advanced filter (substitute your GCE project name):
resource.type="global"
logName="projects/project-name/logs/kubernetes.io%2Fevents"