First all of, for some reasons, I'm using an unsupported and obsolete version of Kubernetes (1.12), and I can't upgrade.
I'm trying to configure the scheduler to avoid running pods on some nodes by changing the node score when the scheduler try to find the best available node, and I would like to do that on scheduler level and not by using nodeAffinity at deployment, replicaset, pod, etc level (therefore all pods will be affected by this change).
After reading the k8s docs here: https://kubernetes.io/docs/reference/scheduling/config/#scheduling-plugins and checking that some options were already present in 1.12, I'm trying to use the NodePreferAvoidPods plugins.
In the documentation the plugin specifies:
Scores nodes according to the node annotation scheduler.alpha.kubernetes.io/preferAvoidPods
Which if understand correctly should do the work.
So, i've updated the static manifest for kube-scheduler.yaml to use the following config:
apiVersion: kubescheduler.config.k8s.io/v1alpha1
kind: KubeSchedulerConfiguration
profiles:
- plugins:
score:
enabled:
- name: NodePreferAvoidPods
weight: 100
clientConnection:
kubeconfig: /etc/kubernetes/scheduler.conf
But adding the following annotation
scheduler.alpha.kubernetes.io/preferAvoidPods: to the node doesn't seem to work.
For testing I'm made a basic nginx deployment with a replica equal to the number of worker nodes (4).
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 4
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.21
ports:
- containerPort: 80
Then I check where the pods where created with kubectl get pods -owide
So, I believe some options are required for this annotation to works.
I've tried to set the annotation to "true", "1" but k8s refuse my change and I can't figure what are the valid options for this annotation and I can't find any documentation about that.
I've checked within git release for 1.12, this plugin was already present (at least there are some lines of codes), I don't think the behavior or settings changed much since.
Thanks.
So from source Kubernetes codes here a valid value for this annoation:
{
"preferAvoidPods": [
{
"podSignature": {
"podController": {
"apiVersion": "v1",
"kind": "ReplicationController",
"name": "foo",
"uid": "abcdef123456",
"controller": true
}
},
"reason": "some reason",
"message": "some message"
}
]
}`
But there is no details on how to predict the uid and no answer where given when asked by another one on github years ago: https://github.com/kubernetes/kubernetes/issues/41630
For my initial question which was to avoid scheduling pods on node, I found an other method by using the well-known taint node.kubernetes.io/unschedulable and the value PreferNoSchedule
Tainting a node with this command do the job and this taint seem persistent across cordon/uncordon (a cordon will set to NoSchedule and uncordon will set it back to PreferNoSchedule).
kubectl taint node NODE_NAME node.kubernetes.io/unschedulable=:PreferNoSchedule
Related
I've created a deployment which exposes a custom metric through an endpoint and an APIService that registers this custom metric, so I can use it in an HPA to autoscale the deployment. To achieve this, I've followed this tutorial.
It worked well while using an apiregistration.k8s.io/v1beta1 APIService. The metric was exposed correctly and the HPA could read it and scale accordingly. I've tried to update the APIService to version apiregistration.k8s.io/v1 (as v1beta1 is deprecated and removed in Kubernetes v1.22), but then the HPA couldn't pick the metric anymore, with this message:
Message
-------
unable to get metric threatmessages: Service on test services-metrics-service/unable to fetch
metrics from custom metrics API: the server is currently unable to handle the request
(get services.custom.metrics.k8s.io services-metrics-service)
If I manually request the metric, it exists though:
kubectl get --raw /apis/custom.metrics.k8s.io/v1/namespaces/test/services/services-metrics-service/threatmessages |jq .
{
"kind": "MetricValueList",
"apiVersion": "custom.metrics.k8s.io/v1",
"metadata": {
"selfLink": "custom.metrics.k8s.io/v1"
},
"items": [
{
"metricName": "threatmessages",
"timestamp": "2021-02-09T14:43:39.321Z",
"value": "0",
"describedObject": {
"kind": "Service",
"namespace": "test",
"name": "services-metrics-service",
"apiVersion": "/v1"
}
}
]
}
Here are my APIService and HPA resources:
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
name: v1.custom.metrics.k8s.io
spec:
insecureSkipTLSVerify: true
group: custom.metrics.k8s.io
groupPriorityMinimum: 1000
versionPriority: 5
service:
name: services-metrics-service
namespace: test
port: 443
version: v1
---
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: services-parallel-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: services-parallel-deployment
minReplicas: 1
maxReplicas: 10
metrics:
- type: Object
object:
describedObject:
kind: Service
name: services-metrics-service
metric:
name: threatmessages
target:
type: AverageValue
averageValue: 4k
behavior:
scaleDown:
stabilizationWindowSeconds: 30
policies:
- type: Pods
value: 1
periodSeconds: 30
What am I doing wrong? Or are these 2 versions just not compatible for some reason?
According to APIService 1.22 documentation you can find information:
Migrate manifests and API clients to use the apiregistration.k8s.io/v1 API version, available since v1.10.
All existing persisted objects are accessible via the new API
No notable changes
First of all, v1 is available for the version you are using (v.1.19).
Secondly, and more important, All existing persisted objects are accessible via the new API and No notable changes. This means that the objects created using v1beta1 don't need to be updated or modified. They will be available and working even if they were created using v1beta1. That is, after upgrading to version v 1.22, you should have no issue, the same objects will simply be accessible (and, I would think, accessed by HPA) as if they had been created using v1. What is more, they may already (in version 1.19) be accessible as v1, as I will explain next, so you can check now if everything is fine.
I have run some quick tests on a v 1.19 GKE cluster and found if something contains apiVersion: apiregistration.k8s.io/v1beta1 (in fact, using exactly what the OP provided in in question:
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
name: v1.custom.metrics.k8s.io
spec:
insecureSkipTLSVerify: true
group: custom.metrics.k8s.io
groupPriorityMinimum: 1000
versionPriority: 5
service:
name: services-metrics-service
namespace: test
port: 443
version: v1
except using v1beta1 as apiVersion) is applied, and then a get command is used to obtain what was created $ kubectl get apiservices/v1.custom.metrics.k8s.io --output=json, an object marked with apiVersion v1, not v1beta1, will be obtained ("apiVersion": "apiregistration.k8s.io/v1"). And if, intead, apiVersion .../v1 is used during the creation instead of apiVersion: apiregistration.k8s.io/v1beta1, the same is obtained. If either of those is deleted, the other one is gone. It is the same object behind the scenes. Yet it is marked as v1.
As a result of all the above, you should simply revert back to what you were doing when you deployed using v1beta1, given it worked and will keep on working according to APIService 1.22 documentation. You can also run $ kubectl get apiservices/v1.custom.metrics.k8s.io --output=json command, either $ kubectl get APIService --output=json or $ kubectl get APIService.apiregistration.k8s.io --output=json once deployed v1beta1 version, to understand if object is already marked as using v1 behind the scenes despite having created the object with v1beta1 - like is happening in my case.
Creating objects with v1 is not necessary if they were already created with v1beta1.
I have this Deployment object:
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment-webserver-nginx
annotations:
description: This is a demo deployment for nginx webserver
labels:
app: deployment-webserver-nginx
spec:
replicas: 3
selector:
matchLabels:
app: deployment-webserver-pods
template:
metadata:
labels:
app: deployment-webserver-pods
spec:
containers:
- name: nginx
image: nginx:alpine
ports:
- containerPort: 80
My understanding on this Deployment object is that any Pod with app:deployment-webserver-pods label will be selected. Of course, this Deployment object is creating 3 replicas, but I wanted to add one more Pod explicitly like this, so I created a Pod object and had its label as app:deployment-webserver-pods, below is its Pod definition:
apiVersion: v1
kind: Pod
metadata:
name: deployment-webserver-nginx-extra-pod
labels:
app: deployment-webserver-pods
spec:
containers:
- name: nginx-alpine-container-1
image: nginx:alpine
ports:
- containerPort: 81
My expectation was that continuously running Deployment Controller will pick this new Pod, and when I do kubectl get deploy then I will see 4 pods running. But that didn't happen.
I even tried to first create this pod with this label, and then created my Deployment and thought that maybe now this explicit Pod will be picked but still that didn't happen.
Doesn't Labels and Selectors work like this?
I know I can scale by deployment to 4 Replicas, but I am trying to understand how Pods / other Kubernetes objects are selected using Labels and Selectors.
From the official docs:
Note: You should not create other Pods whose labels match this
selector, either directly, by creating another Deployment, or by
creating another controller such as a ReplicaSet or a
ReplicationController. If you do so, the first Deployment thinks that
it created these other Pods. Kubernetes does not stop you from doing
this.
As described further in docs, it is not recommended to scale replicas of the deployments using the above approach.
Another important point to note from same section of docs:
If you have multiple controllers that have overlapping selectors, the
controllers will fight with each other and won't behave correctly.
My expectation was that continuously running Deployment Controller will pick this new Pod, and when I do kubectl get deploy then I will see 4 pods running. But that didn't happen.
The Deployment Controller does not work like that, it listen for Deployment-resources and "drive" them to desired state. That typically means, if any change in the template:-part, then a new ReplicaSet is created with the number of replicas. You cannot add a Pod to a Deployment in another way than changing replicas: - each instance is created from the same Pod-template and is identical.
Doesn't Labels and Selectors work like this?
... but I am trying to understand how Pods / other Kubernetes objects are selected using Labels and Selectors.
Yes, Labels and Selectors are used for many things in Kubernetes, but not for everything. When you create a Deployment with a label, and a Pod with the same label and finally a Service with a selector - then the traffic addressed to that Service will distribute traffic to your instances of your Deployment as well as to your extra Pod.
Example:
apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
selector:
app: deployment-webserver-pods
ports:
- protocol: TCP
port: 80
targetPort: 8080
Labels and Selector are also useful for management when using e.g. kubectl. You can add labels for Teams or e.g. App, then you can select all Deployments or Pods belonging to that Team or App (e.g. if the app consist of App-deployment and a cache-deployment), e.g:
kubectl get pods -l team=myteam,app=customerservice
My expectation was that continuously running Deployment Controller
will pick this new Pod, and when I do kubectl get deploy then I will
see 4 pods running. But that didn't happen.
Kubernetes is a system that operates "Declaratively" and not "Imperatively" which means you write down the desired state of the application in the cluster typically through a YAML file, and these declared desired states define all of the pieces of your application.
If a cluster were to configured imperatively like the way you are expecting it to be, it would have been very difficult to understand and replicate how the cluster came to be in that state.
Just to add in the above explanations that if we are trying to manually create pod and manage then what is the purpose of having controllers in K8s.
My expectation was that continuously running Deployment Controller
will pick this new Pod, and when I do kubectl get deploy then I will
see 4 pods running. But that didn't happen.
As per your yaml replicas:3 was already set so deployment would not take a new pod as the 4th replica.
I'm experimenting with this, and I'm noticing a difference in behavior that I'm having trouble understanding, namely between running kubectl proxy from within a pod vs running it in a different pod.
The sample configuration run kubectl proxy and the container that needs it* in the same pod on a daemonset, i.e.
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
# ...
spec:
template:
metadata:
# ...
spec:
containers:
# this container needs kubectl proxy to be running:
- name: l5d
# ...
# so, let's run it:
- name: kube-proxy
image: buoyantio/kubectl:v1.8.5
args:
- "proxy"
- "-p"
- "8001"
When doing this on my cluster, I get the expected behavior. However, I will run other services that also need kubectl proxy, so I figured I'd rationalize that into its own daemon set to ensure it's running on all nodes. I thus removed the kube-proxy container and deployed the following daemon set:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: kube-proxy
labels:
app: kube-proxy
spec:
template:
metadata:
labels:
app: kube-proxy
spec:
containers:
- name: kube-proxy
image: buoyantio/kubectl:v1.8.5
args:
- "proxy"
- "-p"
- "8001"
In other words, the same container configuration as previously, but now running in independent pods on each node instead of within the same pod. With this configuration "stuff doesn't work anymore"**.
I realize the solution (at least for now) is to just run the kube-proxy container in any pod that needs it, but I'd like to know why I need to. Why isn't just running it in a daemonset enough?
I've tried to find more information about running kubectl proxy like this, but my search results drown in results about running it to access a remote cluster from a local environment, i.e. not at all what I'm after.
I include these details not because I think they're relevant, but because they might be even though I'm convinced they're not:
*) a Linkerd ingress controller, but I think that's irrelevant
**) in this case, the "working" state is that the ingress controller complains that the destination is unknown because there's no matching ingress rule, while the "not working" state is a network timeout.
namely between running kubectl proxy from within a pod vs running it in a different pod.
Assuming your cluster has an software defined network, such as flannel or calico, a Pod has its own IP and all containers within a Pod share the same networking space. Thus:
containers:
- name: c0
command: ["curl", "127.0.0.1:8001"]
- name: c1
command: ["kubectl", "proxy", "-p", "8001"]
will work, whereas in a DaemonSet, they are by definition not in the same Pod and thus the hypothetical c0 above would need to use the DaemonSet's Pod's IP to contact 8001. That story is made more complicated by the fact that kubectl proxy by default only listens on 127.0.0.1, so you would need to alter the DaemonSet's Pod's kubectl proxy to include --address='0.0.0.0' --accept-hosts='.*' to even permit such cross-Pod communication. I believe you also need to declare the ports: array in the DaemonSet configuration, since you are now exposing that port into the cluster, but I'd have to double-check whether ports: is merely polite, or is actually required.
I learned recently that Kubernetes has a feature called Init Containers. Awesome, because I can use this feature to wait for my postgres service and create/migrate the database before my web application service runs.
However, it appears that Init Containers can only be configured in a Pod yaml file. Is there a way I can do this via a Deployment yaml file? Or do I have to choose?
To avoid confusion, ill answer your specific question. i agree with oswin that you may want to consider another method.
Yes, you can use init containers with a deployment. this is an example using the old style (pre 1.6) but it should work
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: 'nginx'
spec:
replicas: 1
selector:
matchLabels:
app: 'nginx'
template:
metadata:
labels:
app: 'nginx'
annotations:
pod.beta.kubernetes.io/init-containers: '[
{
"name": "install",
"image": "busybox",
"imagePullPolicy": "IfNotPresent",
"command": ["wget", "-O", "/application/index.html", "http://kubernetes.io/index.html"],
"volumeMounts": [
{
"name": "application",
"mountPath": "/application"
}
]
}
]'
spec:
volumes:
- name: 'application'
emptyDir: {}
containers:
- name: webserver
image: 'nginx'
ports:
- name: http
containerPort: 80
volumeMounts:
- name: 'application'
mountPath: '/application'
You probably want to use readiness probes instead of init containers for this use case. Check out this link and a blog. Also note that a deployment will not send traffic to a pod that is not reported ready - If that was your worry.
This is a well known pattern and a readiness probe in the web server would simply check the DB endpoint / data availability before reporting ready. This is a simple solution as opposed to the complexity of an extra init container and has the advantage of detecting DB outages correctly as well.
I have created a Kubernetes autoscaler, but I need to change its parameters. How do I update it?
I've tried the following, but it fails:
✗ kubectl autoscale -f docker/production/web-controller.yaml --min=2 --max=6
Error from server: horizontalpodautoscalers.extensions "web" already exists
You can always interactively edit the resources in your cluster. For your autoscale controller called web, you can edit it via:
kubectl edit hpa web
If you're looking for a more programmatic way to update your horizontal pod autoscaler, you would have better luck describing your autoscaler entity in a yaml file, as well. For example, here's a simple Replication Controller, paired with a Horizontal Pod Autoscale entity:
apiVersion: v1
kind: ReplicationController
metadata:
name: nginx
spec:
replicas: 2
template:
metadata:
labels:
run: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: nginx
namespace: default
spec:
maxReplicas: 3
minReplicas: 2
scaleTargetRef:
apiVersion: v1
kind: ReplicationController
name: nginx
With those contents in a file called nginx.yaml, updating the autoscaler could be done via kubectl apply -f nginx.yaml.
You can use the kubectl patch command as well, to see its current status
kubectl get hpa <autoscaler-name-here> -o json
An example output:
{
"apiVersion": "autoscaling/v1",
"kind": "HorizontalPodAutoscaler",
"metadata": {
...
"name": "your-auto-scaler",
"namespace": "your-namespace",
...
},
"spec": {
"maxReplicas": 50,
"minReplicas": 2,
"scaleTargetRef": {
"apiVersion": "extensions/v1beta1",
"kind": "Deployment",
"name": "your-deployment"
},
"targetCPUUtilizationPercentage": 40
},
"status": {
"currentReplicas": 1,
"desiredReplicas": 2,
"lastScaleTime": "2017-12-13T16:23:41Z"
}
}
If you want to update the minimum number of replicas:
kubectl -n your-namespace patch hpa your-auto-scaler --patch '{"spec":{"minReplicas":1}}'
The same logic applies to other parameters found in the autoscaler configuration, change minReplicas to maxReplicas if you want to update the maximum number of allowed replicas.
First delete the autoscaler and then re-create it:
✗ kubectl delete hpa web
✗ kubectl autoscale -f docker/production/web-controller.yaml --min=2 --max=6
I tried multiple options but the one which works the best is
First delete the existing hpa
kubectl delete hpa web
then recreate another one
kubectl autoscale -f docker/production/web-controller.yaml --min=2 --max=6