I want to add a resource limit and request using Kustomize if and only if it's not already configured. Problem is that the deployment is in fact a list of deployments, so I cannot use default values:
values.yaml
myDeployments:
- name: deployment1
- name: deployment2
resources:
limits:
cpu: 150
memory: 200
kustomize.yaml
- target:
kind: Deployment
patch: |-
- op: add
path: "/spec/template/spec/containers/0/resources"
value:
limits:
cpu: 300
memory: 400
Problem here is that it's replaces both deployments' resources, ignoring the resources defined in values.yaml.
You can't make Kustomize conditionally apply a patch based on whether or not the resource limits already exists. You could use labels to identify deployments that should receive the default resource limits, e.g. given something like:
apiVersion: apps/v1
kind: Deployment
metadata:
name: example
spec:
replicas: 1
template:
spec:
metadata:
labels:
example.com/default_limits: "true"
[...]
You could do something like this in your kustomization.yaml:
- target:
kind: Deployment
labelSelector: example.com/default_limits=true
patch: |-
- op: add
path: "/spec/template/spec/containers/0/resources"
value:
limits:
cpu: 300
memory: 400
However, you could also simply set a default resource limits in your target namespace. See "Configure Default CPU Requests and Limits for a Namespace" for details. You would create a LimitRange resource in your namespace:
apiVersion: v1
kind: LimitRange
metadata:
name: cpu-limit-range
spec:
limits:
- type: container
default:
cpu: 150
memory: 200
This would be applied to any containers that don't declare their own resource limits, which is I think the behavior you're looking for.
Related
I'm trying to patch multiple targets, of different types (let's say deployment and a replicaset) using the kubectl command, i've made the following file with all the patch info:
patch_list_changes.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: custom-metric-sd
namespace: default
spec:
template:
spec:
containers:
- name: sd-dummy-exporter
resources:
requests:
cpu: 90m
---
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: frontend
namespace: default
spec:
template:
spec:
containers:
- name: php-redis
resources:
requests:
cpu: 200m
i've tried the following commands in the terminal but nothing allows my to patch to work:
> kubectl patch -f patch_list_changes.yaml --patch-file patch_list_changes.yaml
deployment.apps/custom-metric-sd patched
Error from server (BadRequest): the name of the object (custom-metric-sd) does not match the name on the URL (frontend)
and
> kubectl apply -f patch_list_changes.yaml
error: error validating "patch_list_changes.yaml": error validating data: [ValidationError(Deployment.spec.template.spec): unknown field "resources" in io.k8s.api.core.v1.PodSpec, ValidationError(Deployment.spec): missing required field "selector" in io.k8s.api.apps.v1.DeploymentSpec]; if you choose to ignore these errors, turn validation off with --validate=false```
is there any way to run multiple patches in a single command?
The appropriate approach is to use Kustomization for this purposes
https://github.com/nirgeier/KubernetesLabs/tree/master/Labs/08-Kustomization
Based upon those samples I wrote:
Example:
Prepare your patches and use Kustomization
# kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
- ../../_base
patchesStrategicMerge:
- patch-memory.yaml
- patch-replicas.yaml
- patch-service.yaml
I am deploying the flink stateful app using the below-mentioned YAML file.
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
name: operational-reporting-15gb
spec:
image:.azurecr.io/stateful-app-v2
flinkVersion: v1_15
flinkConfiguration:
taskmanager.numberOfTaskSlots: "2"
state.savepoints.dir: abfs://flinktest#.dfs.core.windows.net/savepoints.v2
state.checkpoints.dir: abfs://flinktest#.dfs.core.windows.net/checkpoints.v2
high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
high-availability.storageDir: abfs://flinktest#.dfs.core.windows.net/ha.v2
serviceAccount: flink
jobManager:
resource:
memory: "15360m"
cpu: 2
taskManager:
resource:
memory: "15360m"
cpu: 3
podTemplate:
spec:
containers:
- name: flink-main-container
volumeMounts:
- mountPath: /flink-data
name: flink-volume
volumes:
- name: flink-volume
emptyDir: {}
job:
jarURI: local:///opt/operationalReporting.jar
parallelism: 1
upgradeMode: savepoint
state: running
Flink jobs are running perfectly.
For auto-scaling I created HPA using the following code.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: basic-hpa
namespace: default
spec:
minReplicas: 1
maxReplicas: 15
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageValue: 100m
scaleTargetRef:
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
name: operational-reporting-15gb
While describing the auto scaling I am getting below mentioned error.
Type Status Reason Message
AbleToScale False FailedGetScale the HPA controller was unable to get the target's current scale: flinkdeployments.flink.apache.org "operational-reporting-15gb" not found
Events:
Type Reason Age From Message
Warning FailedGetScale 4m4s (x121 over 34m) horizontal-pod-autoscaler flinkdeployments.flink.apache.org "operational-reporting-15gb" not found
For HPA target is showing UNKNOW. Kindly help
I assume you are following the HPA example of the Kubernetes Operator. Thanks for giving it a try, it is an experimental feature as outlined in the docs, we only have limited experience with it at the moment.
That said checking the obvious is your FlinkDeployment named operational-reporting-15gb running in the default namespace? Otherwise please adjust the namespace of your HPA accordingly.
Also please make sure that you have the latest FlinkDeployment CRD installed. Just having v1beta1 only ensures complatibility it is not actually a fixed version and we added the scale subresource relatively recently.
git clone https://github.com/apache/flink-kubernetes-operator
cd flink-kubernetes-operator
kubectl replace -f helm/flink-kubernetes-operator/crds/flinkdeployments.flink.apache.org-v1.yml
My pod scaler fails to deploy, and keeps giving an error of FailedGetResourceMetric:
Warning FailedComputeMetricsReplicas 6s horizontal-pod-autoscaler failed to compute desired number of replicas based on listed metrics for Deployment/default/bot-deployment: invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
I have ensured to install metrics-server as you can see when I run the the following command to show the metrics-server resource on the cluster:
kubectl get deployment metrics-server -n kube-system
It shows this:
metrics-server
I also set the --kubelet-insecure-tls and --kubelet-preferred-address-types=InternalIP options in the args section of the metrics-server manifest file.
This is what my deployment manifest looks like:
apiVersion: apps/v1
kind: Deployment
metadata:
name: bot-deployment
labels:
app: bot
spec:
replicas: 1
selector:
matchLabels:
app: bot
template:
metadata:
labels:
app: bot
spec:
containers:
- name: bot-api
image: gcr.io/<repo>
ports:
- containerPort: 5600
volumeMounts:
- name: bot-volume
mountPath: /core
- name: wallet
image: gcr.io/<repo>
ports:
- containerPort: 5000
resources:
requests:
cpu: 800m
limits:
cpu: 1500m
volumeMounts:
- name: bot-volume
mountPath: /wallet_
volumes:
- name: bot-volume
emptyDir: {}
The specifications for my pod scaler is shown below too:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: bot-scaler
spec:
metrics:
- resource:
name: cpu
target:
averageUtilization: 85
type: Utilization
type: Resource
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: bot-deployment
minReplicas: 1
maxReplicas: 10
Because of this the TARGET options always remains as /80%. Upon introspection, the HPA makes that same complaint over and over again, I have tried all options, that I have seen on some other questions, but none of them seem to work. I have also tried uninstalling and reinstalling the metrics-server many times, but it doesn't work.
One thing I notice though, is that the metrics-server seems to shut down after I deploy the HPA manifest, and it fails to start. When i check the state of the metrics-server the READY option shows 0/1 even though it was initially 1/1. What could be wrong?
I will gladly provide as much info as needed. Thank you!
Looks like your bot-api is missing it's resource request and limit. your wallet has them though. the hpa uses all the resources in the pod to calculate the utilization
I am trying to auto-scale my redis workers based on queue size, I am collecting the metrics using redis_exporter and promethues-to-sd sidecars in my redis deployment as so:
spec:
containers:
- name: master
image: redis
env:
- name: MASTER
value: "true"
ports:
- containerPort: 6379
resources:
limits:
cpu: "100m"
requests:
cpu: "100m"
- name: redis-exporter
image: oliver006/redis_exporter:v0.21.1
env:
ports:
- containerPort: 9121
args: ["--check-keys=rq*"]
resources:
requests:
cpu: 100m
memory: 100Mi
- name: prometheus-to-sd
image: gcr.io/google-containers/prometheus-to-sd:v0.9.2
command:
- /monitor
- --source=:http://localhost:9121
- --stackdriver-prefix=custom.googleapis.com
- --pod-id=$(POD_ID)
- --namespace-id=$(POD_NAMESPACE)
- --scrape-interval=15s
- --export-interval=15s
env:
- name: POD_ID
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.uid
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
resources:
requests:
cpu: 100m
memory: 100Mi
I can then view the metric (redis_key_size) in Metrics Explorer as:
metric.type="custom.googleapis.com/redis_key_size"
resource.type="gke_container"
(I CAN'T view the metric if I change resource.type=k8_pod)
However I can't seem to get the HPA to read in these metrics getting a failed to get metrics error, and can't seem to figure out the correct Object definition.
I've tried both .object.target.kind=Pod and Deployment, with deployment I get the additional error "Get namespaced metric by name for resource \"deployments\"" is not implemented.
I don't know if this issue is related to the resource.type="gke_container" and how to change that?
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: {{ template "webapp.backend.fullname" . }}-workers
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ template "webapp.backend.fullname" . }}-workers
minReplicas: 1
maxReplicas: 4
metrics:
- type: Object
object:
target:
kind: <not sure>
name: <not sure>
metricName: redis_key_size
targetValue: 4
--- Update ---
This works if I use kind: Pod and manually set name to the pod name created by the deployment, however this is far from perfect.
I also tried this setup using type Pods, however the HPA says it can't read the metrics horizontal-pod-autoscaler failed to get object metric value: unable to get metric redis_key_size: no metrics returned from custom metrics API
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: {{ template "webapp.backend.fullname" . }}-workers
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ template "webapp.backend.fullname" . }}-workers
minReplicas: 1
maxReplicas: 4
metrics:
- type: Pods
pods:
metricName: redis_key_size
targetAverageValue: 4
As a workaround for deployments it appears that the metrics have to be exported from pods IN the target deployment.
To get this working I had to move the prometheus-to-sd container to the deployment I wanted to scale and then scrape the exposed metrics from Redis-Exporter in the Redis deployment via the Redis service, exposing 9121 on the Redis service, and changing the CLA for the the prometheus-to-sd container such that:
- --source=:http://localhost:9121 -> - --source=:http://my-redis-service:9121
and then using the HPA
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: {{ template "webapp.backend.fullname" . }}-workers
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ template "webapp.backend.fullname" . }}-workers
minReplicas: 1
maxReplicas: 4
metrics:
- type: Pods
pods:
metricName: redis_key_size
targetAverageValue: 4
I am trying to create a resource quota for a namespace in kubernetes. While writing the yaml file for the Resource Quota, what should I specify for the cpu requests - "cpu" or "requests.cpu" ? Also, is there any official documentation which specifies the difference between the two? I went through openshift docs which specify that both of these are same and can be used interchangeably.
requests.cpu is used for ResourceQuota which can be applied at the namespace level.
apiVersion: v1
kind: ResourceQuota
metadata:
name: mem-cpu-demo
spec:
hard:
requests.cpu: "1"
requests.memory: 1Gi
limits.cpu: "2"
limits.memory: 2Gi
where as cpu is applied at pod level.
apiVersion: v1
kind: Pod
metadata:
name: quota-mem-cpu-demo
spec:
containers:
- name: quota-mem-cpu-demo-ctr
image: nginx
resources:
limits:
memory: "800Mi"
cpu: "800m"
requests:
memory: "600Mi"
cpu: "400m"
for further details please refer to the below link.
https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/quota-memory-cpu-namespace/
You can use the cpu form if you follow the Kubernetes documentation.
The difference between adding requests or limit before memory or cpu in quota is described here: https://kubernetes.io/docs/concepts/policy/resource-quotas/#requests-vs-limits
The final results is the same but if you use the requests or limit you will have to have each containers in pod having those specified.