Enable use of images from the local library on Kubernetes - kubernetes

I'm following a tutorial https://docs.openfaas.com/tutorials/first-python-function/,
currently, I have the right image
$ docker images | grep hello-openfaas
wm/hello-openfaas latest bd08d01ce09b 34 minutes ago 65.2MB
$ faas-cli deploy -f ./hello-openfaas.yml
Deploying: hello-openfaas.
WARNING! You are not using an encrypted connection to the gateway, consider using HTTPS.
Deployed. 202 Accepted.
URL: http://IP:8099/function/hello-openfaas
there is a step that forewarns me to do some setup(My case is I'm using Kubernetes and minikube and don't want to push to a remote container registry, I should enable the use of images from the local library on Kubernetes.), I see the hints
see the helm chart for how to set the ImagePullPolicy
I'm not sure how to configure it correctly. the final result indicates I failed.
Unsurprisingly, I couldn't access the function service, I find some clues in https://docs.openfaas.com/deployment/troubleshooting/#openfaas-didnt-start which might help to diagnose the problem.
$ kubectl logs -n openfaas-fn deploy/hello-openfaas
Error from server (BadRequest): container "hello-openfaas" in pod "hello-openfaas-558f99477f-wd697" is waiting to start: trying and failing to pull image
$ kubectl describe -n openfaas-fn deploy/hello-openfaas
Name: hello-openfaas
Namespace: openfaas-fn
CreationTimestamp: Wed, 16 Mar 2022 14:59:49 +0800
Labels: faas_function=hello-openfaas
Annotations: deployment.kubernetes.io/revision: 1
prometheus.io.scrape: false
Selector: faas_function=hello-openfaas
Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 0 max unavailable, 1 max surge
Pod Template:
Labels: faas_function=hello-openfaas
Annotations: prometheus.io.scrape: false
Containers:
hello-openfaas:
Image: wm/hello-openfaas:latest
Port: 8080/TCP
Host Port: 0/TCP
Liveness: http-get http://:8080/_/health delay=2s timeout=1s period=2s #success=1 #failure=3
Readiness: http-get http://:8080/_/health delay=2s timeout=1s period=2s #success=1 #failure=3
Environment:
fprocess: python3 index.py
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available False MinimumReplicasUnavailable
Progressing False ProgressDeadlineExceeded
OldReplicaSets: <none>
NewReplicaSet: hello-openfaas-558f99477f (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 29m deployment-controller Scaled up replica set hello-openfaas-558f99477f to 1
hello-openfaas.yml
version: 1.0
provider:
name: openfaas
gateway: http://IP:8099
functions:
hello-openfaas:
lang: python3
handler: ./hello-openfaas
image: wm/hello-openfaas:latest
imagePullPolicy: Never
I create a new project hello-openfaas2 to reproduce this error
$ faas-cli new --lang python3 hello-openfaas2 --prefix="wm"
Folder: hello-openfaas2 created.
# I add `imagePullPolicy: Never` to `hello-openfaas2.yml`
$ faas-cli build -f ./hello-openfaas2.yml
$ faas-cli deploy -f ./hello-openfaas2.yml
Deploying: hello-openfaas2.
WARNING! You are not using an encrypted connection to the gateway, consider using HTTPS.
Deployed. 202 Accepted.
URL: http://192.168.1.3:8099/function/hello-openfaas2
$ kubectl logs -n openfaas-fn deploy/hello-openfaas2
Error from server (BadRequest): container "hello-openfaas2" in pod "hello-openfaas2-7c67488865-7d7vm" is waiting to start: image can't be pulled
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-64897985d-kp7vf 1/1 Running 0 47h
...
openfaas-fn env-6c79f7b946-bzbtm 1/1 Running 0 4h28m
openfaas-fn figlet-54db496f88-957xl 1/1 Running 0 18h
openfaas-fn hello-openfaas-547857b9d6-z277c 0/1 ImagePullBackOff 0 127m
openfaas-fn hello-openfaas-7b6946b4f9-hcvq4 0/1 ImagePullBackOff 0 165m
openfaas-fn hello-openfaas2-7c67488865-qmrkl 0/1 ImagePullBackOff 0 13m
openfaas-fn hello-openfaas3-65847b8b67-b94kd 0/1 ImagePullBackOff 0 97m
openfaas-fn hello-python-554b464498-zxcdv 0/1 ErrImagePull 0 3h23m
openfaas-fn hello-python-8698bc68bd-62gh9 0/1 ImagePullBackOff 0 3h25m
from https://docs.openfaas.com/reference/yaml/, I know I put the imagePullPolicy in the wrong place, there is no such keyword in its schema.
I also tried eval $(minikube docker-env and still get the same error.
I've a feeling that faas-cli deploy can be replace by helm, they all mean to run the image(whether from remote or local) in Kubernetes cluster, then I can use helm chart to setup the pullPolicy there. Even though the detail is not still clear to me, This discovery inspires me.
So far, after eval $(minikube docker-env)
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
wm/hello-openfaas2 0.1 03c21bd96d5e About an hour ago 65.2MB
python 3-alpine 69fba17b9bae 12 days ago 48.6MB
ghcr.io/openfaas/figlet latest ca5eef0de441 2 weeks ago 14.8MB
ghcr.io/openfaas/alpine latest 35f3d4be6bb8 2 weeks ago 14.2MB
ghcr.io/openfaas/faas-netes 0.14.2 524b510505ec 3 weeks ago 77.3MB
k8s.gcr.io/kube-apiserver v1.23.3 f40be0088a83 7 weeks ago 135MB
k8s.gcr.io/kube-controller-manager v1.23.3 b07520cd7ab7 7 weeks ago 125MB
k8s.gcr.io/kube-scheduler v1.23.3 99a3486be4f2 7 weeks ago 53.5MB
k8s.gcr.io/kube-proxy v1.23.3 9b7cc9982109 7 weeks ago 112MB
ghcr.io/openfaas/gateway 0.21.3 ab4851262cd1 7 weeks ago 30.6MB
ghcr.io/openfaas/basic-auth 0.21.3 16e7168a17a3 7 weeks ago 14.3MB
k8s.gcr.io/etcd 3.5.1-0 25f8c7f3da61 4 months ago 293MB
ghcr.io/openfaas/classic-watchdog 0.2.0 6f97aa96da81 4 months ago 8.18MB
k8s.gcr.io/coredns/coredns v1.8.6 a4ca41631cc7 5 months ago 46.8MB
k8s.gcr.io/pause 3.6 6270bb605e12 6 months ago 683kB
ghcr.io/openfaas/queue-worker 0.12.2 56e7216201bc 7 months ago 7.97MB
kubernetesui/dashboard v2.3.1 e1482a24335a 9 months ago 220MB
kubernetesui/metrics-scraper v1.0.7 7801cfc6d5c0 9 months ago 34.4MB
nats-streaming 0.22.0 12f2d32e0c9a 9 months ago 19.8MB
gcr.io/k8s-minikube/storage-provisioner v5 6e38f40d628d 11 months ago 31.5MB
functions/markdown-render latest 93b5da182216 2 years ago 24.6MB
functions/hubstats latest 01affa91e9e4 2 years ago 29.3MB
functions/nodeinfo latest 2fe8a87bf79c 2 years ago 71.4MB
functions/alpine latest 46c6f6d74471 2 years ago 21.5MB
prom/prometheus v2.11.0 b97ed892eb23 2 years ago 126MB
prom/alertmanager v0.18.0 ce3c87f17369 2 years ago 51.9MB
alexellis2/openfaas-colorization 0.4.1 d36b67b1b5c1 2 years ago 1.84GB
rorpage/text-to-speech latest 5dc20810eb54 2 years ago 86.9MB
stefanprodan/faas-grafana 4.6.3 2a4bd9caea50 4 years ago 284MB
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-64897985d-kp7vf 1/1 Running 0 6d
kube-system etcd-minikube 1/1 Running 0 6d
kube-system kube-apiserver-minikube 1/1 Running 0 6d
kube-system kube-controller-manager-minikube 1/1 Running 0 6d
kube-system kube-proxy-5m8lr 1/1 Running 0 6d
kube-system kube-scheduler-minikube 1/1 Running 0 6d
kube-system storage-provisioner 1/1 Running 1 (6d ago) 6d
kubernetes-dashboard dashboard-metrics-scraper-58549894f-97tsv 1/1 Running 0 5d7h
kubernetes-dashboard kubernetes-dashboard-ccd587f44-lkwcx 1/1 Running 0 5d7h
openfaas-fn base64-6bdbcdb64c-djz8f 1/1 Running 0 5d1h
openfaas-fn colorise-85c74c686b-2fz66 1/1 Running 0 4d5h
openfaas-fn echoit-5d7df6684c-k6ljn 1/1 Running 0 5d1h
openfaas-fn env-6c79f7b946-bzbtm 1/1 Running 0 4d5h
openfaas-fn figlet-54db496f88-957xl 1/1 Running 0 4d19h
openfaas-fn hello-openfaas-547857b9d6-z277c 0/1 ImagePullBackOff 0 4d3h
openfaas-fn hello-openfaas-7b6946b4f9-hcvq4 0/1 ImagePullBackOff 0 4d3h
openfaas-fn hello-openfaas2-5c6f6cb5d9-24hkz 0/1 ImagePullBackOff 0 9m22s
openfaas-fn hello-openfaas2-8957bb47b-7cgjg 0/1 ImagePullBackOff 0 2d22h
openfaas-fn hello-openfaas3-65847b8b67-b94kd 0/1 ImagePullBackOff 0 4d2h
openfaas-fn hello-python-6d6976845f-cwsln 0/1 ImagePullBackOff 0 3d19h
openfaas-fn hello-python-b577cb8dc-64wf5 0/1 ImagePullBackOff 0 3d9h
openfaas-fn hubstats-b6cd4dccc-z8tvl 1/1 Running 0 5d1h
openfaas-fn markdown-68f69f47c8-w5m47 1/1 Running 0 5d1h
openfaas-fn nodeinfo-d48cbbfcc-hfj79 1/1 Running 0 5d1h
openfaas-fn openfaas2-fun 1/1 Running 0 15s
openfaas-fn text-to-speech-74ffcdfd7-997t4 0/1 CrashLoopBackOff 2235 (3s ago) 4d5h
openfaas-fn wordcount-6489865566-cvfzr 1/1 Running 0 5d1h
openfaas alertmanager-88449c789-fq2rg 1/1 Running 0 3d1h
openfaas basic-auth-plugin-75fd7d69c5-zw4jh 1/1 Running 0 3d2h
openfaas gateway-5c4bb7c5d7-n8h27 2/2 Running 0 3d2h
openfaas grafana 1/1 Running 0 4d8h
openfaas nats-647b476664-hkr7p 1/1 Running 0 3d2h
openfaas prometheus-687648749f-tl8jp 1/1 Running 0 3d1h
openfaas queue-worker-7777ffd7f6-htx6t 1/1 Running 0 3d2h
$ kubectl get -o yaml -n openfaas-fn deploy/hello-openfaas2
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "6"
prometheus.io.scrape: "false"
creationTimestamp: "2022-03-17T12:47:35Z"
generation: 6
labels:
faas_function: hello-openfaas2
name: hello-openfaas2
namespace: openfaas-fn
resourceVersion: "400833"
uid: 9c4e9d26-23af-4f93-8538-4e2d96f0d7e0
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
faas_function: hello-openfaas2
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
annotations:
prometheus.io.scrape: "false"
creationTimestamp: null
labels:
faas_function: hello-openfaas2
uid: "969512830"
name: hello-openfaas2
spec:
containers:
- env:
- name: fprocess
value: python3 index.py
image: wm/hello-openfaas2:0.1
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
httpGet:
path: /_/health
port: 8080
scheme: HTTP
initialDelaySeconds: 2
periodSeconds: 2
successThreshold: 1
timeoutSeconds: 1
name: hello-openfaas2
ports:
- containerPort: 8080
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /_/health
port: 8080
scheme: HTTP
initialDelaySeconds: 2
periodSeconds: 2
successThreshold: 1
timeoutSeconds: 1
resources: {}
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: false
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
enableServiceLinks: false
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
conditions:
- lastTransitionTime: "2022-03-17T12:47:35Z"
lastUpdateTime: "2022-03-17T12:47:35Z"
message: Deployment does not have minimum availability.
reason: MinimumReplicasUnavailable
status: "False"
type: Available
- lastTransitionTime: "2022-03-20T12:16:56Z"
lastUpdateTime: "2022-03-20T12:16:56Z"
message: ReplicaSet "hello-openfaas2-5d6c7c7fb4" has timed out progressing.
reason: ProgressDeadlineExceeded
status: "False"
type: Progressing
observedGeneration: 6
replicas: 2
unavailableReplicas: 2
updatedReplicas: 1
In one shell,
docker#minikube:~$ docker run --name wm -ti wm/hello-openfaas2:0.1
2022/03/20 13:04:52 Version: 0.2.0 SHA: 56bf6aac54deb3863a690f5fc03a2a38e7d9e6ef
2022/03/20 13:04:52 Timeouts: read: 5s write: 5s hard: 0s health: 5s.
2022/03/20 13:04:52 Listening on port: 8080
...
and another shell
docker#minikube:~$ docker ps | grep wm
d7796286641c wm/hello-openfaas2:0.1 "fwatchdog" 3 minutes ago Up 3 minutes (healthy) 8080/tcp wm

When you specify an image to pull from without a url, this defaults to DockerHub. When you use :latest tag, it will always pull the latest image regardless of what pull policy is defined.
So to use local built images - don't use the latest tag.
To make minikube pull images from your local machine, you need to do few things:
Point your docker client to the VM's docker daemon: eval $(minikube docker-env)
Configure image pull policy: imagePullPolicy: Never
There is a flag to pass in to use insecure registries in minikube VM. This must be specified when you create the machine: minikube start --insecure-registry
Note you have to run eval eval $(minikube docker-env) on each terminal you want to use, since it only sets the environment variables for the current shell session.
This flow works:
# Start minikube and set docker env
minikube start
eval $(minikube docker-env)
# Build image
docker build -t foo:1.0 .
# Run in minikube
kubectl run hello-foo --image=foo:1.0 --image-pull-policy=Never
You can read more at the minikube docs.

If your image has a latest tag, the Pod's ImagePullPolicy will be automatically set to Always. Each time the pod is created, Kubernetes tries to pull the newest image.
Try not tagging the image as latest or manually setting the Pod's ImagePullPolicy to Never.
If you're using static manifest to create a Pod, the setting will be like the following:
containers:
- name: test-container
image: testImage:latest
imagePullPolicy: Never

From comments in initial post, I gathered that:
The issue is that the container runtime from your Minikube cluster is distinct from that of your host, where you have built your function image (not always the case: minikube can run with docker driver, which, I think implies the host docker runtime is shared with cluster)
the container runtime in use by Minikube is docker (could have been cri-o / following steps won't apply to that case. Those using crio may switch to docker, as I'm not sure image loading is possible with cri-o )
You can try to build your function image from a shell inside your Minikube instance.
Or you can:
export your image ( docker save -o image.tar my/image )
copy this to your minikube instance ( scp -i ~/.minikube/machines/minikube/id_rsa image.tar docker#$(minikube ip): )
open a shell ( ssh -i ~/.minikube/machines/minikube/id_rsa docker#$(minikube ip) )
load that image ( docker load -i image.tar )
Then, make sure your openfaas was deployed with faasnetes.imagePullPolicy=Never or IfNotPresent, as I doubt setting the imagePullPolicy directly in your function would do (haven't read about this in their docs, which instead mentions, as you pointed it out, to override this during openfaas deployment). Checking your deployment yaml definition ( kubectl get -o yaml -n openpaas-fn deploy/hello-openfaas ) should confirm you're not using Always: if that's already the case, no need to dig further: just make sure your image is imported, with name and tag matching that referenced by your function.
... Answering your last comment: you're not sure how openfaas was deployed. One way to make sure the proper option was set would be to look at the gateway deployment, in openfaas namespace ( kubectl get -o yaml -n openfaas deploy/gateway ).
In there, you should find a container named "operator". That container should include a few environment variables, one of which may be image_pull_policy. (we can see this looking at the Chart sources ). You want that environment variable to be set to IfNotPresent, add it or edit it if needed.
Checking your last edit, we can see the Deployment object created by your function says:
image: wm/hello-openfaas2:0.1
imagePullPolicy: Always
So for sure: you do need to reconfigure openfaas, adding that image_pull_policy environment variable.

Related

Kuberenetes Available schedulars

How would I display available schedulers in my cluster in order to use non default one using the schedulerName field?
Any link to a document describing how to "install" and use a custom scheduler is highly appreciated :)
Thx in advance
Schedulers can be found among your kube-system pods. You can then filter the output to your needs with kube-scheduler as the search key:
➜ ~ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-6955765f44-9wfkp 0/1 Completed 15 264d
coredns-6955765f44-jmz9j 1/1 Running 16 264d
etcd-acid-fuji 1/1 Running 17 264d
kube-apiserver-acid-fuji 1/1 Running 6 36d
kube-controller-manager-acid-fuji 1/1 Running 21 264d
kube-proxy-hs2qb 1/1 Running 0 177d
kube-scheduler-acid-fuji 1/1 Running 21 264d
You can retrieve the yaml file with:
➜ ~ kubectl get pods -n kube-system <scheduler pod name> -oyaml
If you bootstrapped your cluster with Kubeadm you may also find the yaml files in the /etc/kubernetes/manifests:
➜ manifests sudo cat /etc/kubernetes/manifests/kube-scheduler.yaml
---
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: kube-scheduler
tier: control-plane
name: kube-scheduler
namespace: kube-system
spec:
containers:
- command:
- kube-scheduler
- --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
- --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
- --bind-address=127.0.0.1
- --kubeconfig=/etc/kubernetes/scheduler.conf
- --leader-elect=true
image: k8s.gcr.io/kube-scheduler:v1.17.6
imagePullPolicy: IfNotPresent
---------
The location for minikube is similar but you do have to login in the minikube's virtual machine first with minikube ssh.
For more reading please have a look how to configure multiple schedulers and how to write custom schedulers.
You can try this one:
kubectl get pods --all-namespaces | grep scheduler

kubernetes pending pod priority

I have the following pods on my kubernetes (1.18.3) cluster:
NAME READY STATUS RESTARTS AGE
pod1 1/1 Running 0 14m
pod2 1/1 Running 0 14m
pod3 0/1 Pending 0 14m
pod4 0/1 Pending 0 14m
pod3 and pod4 cannot start because the node has capacity for 2 pods only. When pod1 finishes and quits, then the scheduler picks either pod3 or pod4 and starts it. So far so good.
However, I also have a high priority pod (hpod) that I'd like to start before pod3 or pod4 when either of the running pods finishes and quits.
So I created a priorityclass can be found in the kubernetes docs:
kind: PriorityClass
metadata:
name: high-priority-no-preemption
value: 1000000
preemptionPolicy: Never
globalDefault: false
description: "This priority class should be used for XYZ service pods only."
I've created the following pod yaml:
apiVersion: v1
kind: Pod
metadata:
name: hpod
labels:
app: hpod
spec:
containers:
- name: hpod
image: ...
resources:
requests:
cpu: "500m"
memory: "500Mi"
limits:
cpu: "500m"
memory: "500Mi"
priorityClassName: high-priority-no-preemption
Now the problem is that when I start the high prio pod with kubectl apply -f hpod.yaml, then the scheduler terminates a running pod to allow the high priority pod to start despite I've set 'preemptionPolicy: Never'.
The expected behaviour would be to postpone starting hpod until a currently running pod finishes. And when it does, then let hpod start before pod3 or pod4.
What am I doing wrong?
Prerequisites:
This solution was tested on Kubernetes v1.18.3, docker 19.03 and Ubuntu 18.
Also text editor is required (i.e. sudo apt-get install vim).
In Kubernetes documentation under How to disable preemption you can find Note:
Note: In Kubernetes 1.15 and later, if the feature NonPreemptingPriority is enabled, PriorityClasses have the option to set preemptionPolicy: Never. This will prevent pods of that PriorityClass from preempting other pods.
Also under Non-preempting PriorityClass you have information:
The use of the PreemptionPolicy field requires the NonPreemptingPriority feature gate to be enabled.
Later if you will check thoses Feature Gates info, you will find that NonPreemptingPriority is false, so as default it's disabled.
Output with your current configuration:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-normal 1/1 Running 0 32s
nginx-normal-2 1/1 Running 0 32s
$ kubectl apply -f prio.yaml
pod/nginx-priority created$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-normal-2 1/1 Running 0 48s
nginx-priority 1/1 Running 0 8s
To enable preemptionPolicy: Never you need to apply --feature-gates=NonPreemptingPriority=true to 3 files:
/etc/kubernetes/manifests/kube-apiserver.yaml
/etc/kubernetes/manifests/kube-controller-manager.yaml
/etc/kubernetes/manifests/kube-scheduler.yaml
To check if this feature-gate is enabled you can check by using commands:
ps aux | grep apiserver | grep feature-gates
ps aux | grep scheduler | grep feature-gates
ps aux | grep controller-manager | grep feature-gates
For quite detailed information, why you have to edit thoses files please check this Github thread.
$ sudo su
# cd /etc/kubernetes/manifests/
# ls
etcd.yaml kube-apiserver.yaml kube-controller-manager.yaml kube-scheduler.yaml
Use your text editor to add feature gate to those files
# vi kube-apiserver.yaml
and add - --feature-gates=NonPreemptingPriority=true under spec.containers.command like in example bellow:
spec:
containers:
- command:
- kube-apiserver
- --feature-gates=NonPreemptingPriority=true
- --advertise-address=10.154.0.31
And do the same with 2 other files. After that you can check if this flags were applied.
$ ps aux | grep apiserver | grep feature-gates
root 26713 10.4 5.2 565416 402252 ? Ssl 14:50 0:17 kube-apiserver --feature-gates=NonPreemptingPriority=true --advertise-address=10.154.0.31
Now you have redeploy your PriorityClass.
$ kubectl get priorityclass
NAME VALUE GLOBAL-DEFAULT AGE
high-priority-no-preemption 1000000 false 12m
system-cluster-critical 2000000000 false 23m
system-node-critical 2000001000 false 23m
$ kubectl delete priorityclass high-priority-no-preemption
priorityclass.scheduling.k8s.io "high-priority-no-preemption" deleted
$ kubectl apply -f class.yaml
priorityclass.scheduling.k8s.io/high-priority-no-preemption created
Last step is to deploy pod with this PriorityClass.
TEST
$ kubectl get po
NAME READY STATUS RESTARTS AGE
nginx-normal 1/1 Running 0 4m4s
nginx-normal-2 1/1 Running 0 18m
$ kubectl apply -f prio.yaml
pod/nginx-priority created
$ kubectl get po
NAME READY STATUS RESTARTS AGE
nginx-normal 1/1 Running 0 5m17s
nginx-normal-2 1/1 Running 0 20m
nginx-priority 0/1 Pending 0 67s
$ kubectl delete po nginx-normal-2
pod "nginx-normal-2" deleted
$ kubectl get po
NAME READY STATUS RESTARTS AGE
nginx-normal 1/1 Running 0 5m55s
nginx-priority 1/1 Running 0 105s

Argo sample workflows stuck in the pending state

I follow the Argo Workflow's Getting Started documentation. Everything goes smooth until I run the first sample workflow as described in 4. Run Sample Workflows. The workflow just gets stuck in the pending state:
vagrant#master:~$ argo submit --watch https://raw.githubusercontent.com/argoproj/argo/master/examples/hello-world.yaml
Name: hello-world-z4lbs
Namespace: default
ServiceAccount: default
Status: Pending
Created: Thu May 14 12:36:45 +0000 (now)
vagrant#master:~$ argo list
NAME STATUS AGE DURATION PRIORITY
hello-world-z4lbs Pending 27m 0s 0
Here it was mentioned that taints on the muster node may be the problem, so I untainted the master node:
vagrant#master:~$ kubectl taint nodes --all node-role.kubernetes.io/master-
node/master untainted
taint "node-role.kubernetes.io/master" not found
taint "node-role.kubernetes.io/master" not found
Then I deleted the pending workflow and resubmitted it, but it got stuck in the pending state again.
The details of the newly submitted workflow that is also stuck:
vagrant#master:~$ kubectl describe workflow hello-world-8kvmb
Name: hello-world-8kvmb
Namespace: default
Labels: <none>
Annotations: <none>
API Version: argoproj.io/v1alpha1
Kind: Workflow
Metadata:
Creation Timestamp: 2020-05-14T13:57:44Z
Generate Name: hello-world-
Generation: 1
Managed Fields:
API Version: argoproj.io/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:generateName:
f:spec:
.:
f:arguments:
f:entrypoint:
f:templates:
f:status:
.:
f:finishedAt:
f:startedAt:
Manager: argo
Operation: Update
Time: 2020-05-14T13:57:44Z
Resource Version: 16780
Self Link: /apis/argoproj.io/v1alpha1/namespaces/default/workflows/hello-world-8kvmb
UID: aa82d005-b7ac-411f-9d0b-93f34876b673
Spec:
Arguments:
Entrypoint: whalesay
Templates:
Arguments:
Container:
Args:
hello world
Command:
cowsay
Image: docker/whalesay:latest
Name:
Resources:
Inputs:
Metadata:
Name: whalesay
Outputs:
Status:
Finished At: <nil>
Started At: <nil>
Events: <none>
While trying to get the workflow-controller logs I get the follwoing error:
vagrant#master:~$ kubectl logs -n argo -l app=workflow-controller
Error from server (BadRequest): container "workflow-controller" in pod "workflow-controller-6c4787844c-lbksm" is waiting to start: ContainerCreating
The details for the corresponding workflow-controller pod:
vagrant#master:~$ kubectl -n argo describe pods/workflow-controller-6c4787844c-lbksm
Name: workflow-controller-6c4787844c-lbksm
Namespace: argo
Priority: 0
Node: node-1/192.168.50.11
Start Time: Thu, 14 May 2020 12:08:29 +0000
Labels: app=workflow-controller
pod-template-hash=6c4787844c
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/workflow-controller-6c4787844c
Containers:
workflow-controller:
Container ID:
Image: argoproj/workflow-controller:v2.8.0
Image ID:
Port: <none>
Host Port: <none>
Command:
workflow-controller
Args:
--configmap
workflow-controller-configmap
--executor-image
argoproj/argoexec:v2.8.0
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from argo-token-pz4fd (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
argo-token-pz4fd:
Type: Secret (a volume populated by a Secret)
SecretName: argo-token-pz4fd
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SandboxChanged 7m17s (x4739 over 112m) kubelet, node-1 Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 2m18s (x4950 over 112m) kubelet, node-1 (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "1bd1fd11dfe677c749b4a1260c29c2f8cff0d55de113d154a822e68b41f9438e" network for pod "workflow-controller-6c4787844c-lbksm": networkPlugin cni failed to set up pod "workflow-controller-6c4787844c-lbksm_argo" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
I run Argo 2.8:
vagrant#master:~$ argo version
argo: v2.8.0
BuildDate: 2020-05-11T22:55:16Z
GitCommit: 8f696174746ed01b9bf1941ad03da62d312df641
GitTreeState: clean
GitTag: v2.8.0
GoVersion: go1.13.4
Compiler: gc
Platform: linux/amd64
I have checked the cluster status and it looks OK:
vagrant#master:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready master 95m v1.18.2
node-1 Ready <none> 92m v1.18.2
node-2 Ready <none> 92m v1.18.2
As to the K8s cluster installation, I created it using Vagrant as described here, the only differences being:
libvirt as provdier
newer version of Ubuntu: generic/ubuntu1804
newer version of Calico: v3.14
Any idea why the workflows get stuck in the pending state and how to fix it?
Workflows start in the Pending state and then are moved through their steps by the workflow-controller pod (which is installed in the cluster as part of Argo).
The workflow-controller pod is stuck in ContainerCreating. kubectl describe po {workflow-controller pod} reveals a Calico-related network error.
As mentioned in the comments, it looks like a common Calico error. Once you clear that up, your hello-world workflow should execute just fine.
Note from OP: Further debugging confirms the Calico problem (Calico nodes are not in the running state):
vagrant#master:~$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
argo argo-server-84946785b-94bfs 0/1 ContainerCreating 0 3h59m
argo workflow-controller-6c4787844c-lbksm 0/1 ContainerCreating 0 3h59m
kube-system calico-kube-controllers-74d45555dd-zhkp6 0/1 CrashLoopBackOff 56 3h59m
kube-system calico-node-2n9kt 0/1 CrashLoopBackOff 72 3h59m
kube-system calico-node-b8sb8 0/1 Running 70 3h56m
kube-system calico-node-pslzs 0/1 CrashLoopBackOff 67 3h56m
kube-system coredns-66bff467f8-rmxsp 0/1 ContainerCreating 0 3h59m
kube-system coredns-66bff467f8-z4lbq 0/1 ContainerCreating 0 3h59m
kube-system etcd-master 1/1 Running 2 3h59m
kube-system kube-apiserver-master 1/1 Running 2 3h59m
kube-system kube-controller-manager-master 1/1 Running 2 3h59m
kube-system kube-proxy-k59ks 1/1 Running 2 3h59m
kube-system kube-proxy-mn96x 1/1 Running 1 3h56m
kube-system kube-proxy-vxj8b 1/1 Running 1 3h56m
kube-system kube-scheduler-master 1/1 Running 2 3h59m
For the calico CrashLoopBackOff, kubeadm use the default interface eth0 to bootstrap the cluster.
But the eth0 interface is used by Vagrant (for ssh).
You could configure the kubelet to use a private IP address (for instance) and not eth0.
You'll have to do that for each node then vagrant reload.
sudo vim /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
#Add the Environment line in 10-kubeadm.conf and replace your_node_ip
Environment="KUBELET_EXTRA_ARGS=--node-ip=your_node_ip"
Hope it helps

Kubernetes without pod metrics

I´m trying to deploy metrics to kubernetes and something really strange is happening, I have one worker and one master. I have the following pods list:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default php-apache-774ff9d754-d7vp9 1/1 Running 0 2m43s 192.168.77.172 master-node <none> <none>
kube-system calico-kube-controllers-6b9d4c8765-x7pql 1/1 Running 2 4h11m 192.168.77.130 master-node <none> <none>
kube-system calico-node-d4rnh 0/1 Running 1 4h11m 10.221.194.166 master-node <none> <none>
kube-system calico-node-hwkmd 0/1 Running 1 4h11m 10.221.195.58 free5gc-virtual-machine <none> <none>
kube-system coredns-6955765f44-kf4dr 1/1 Running 1 4h20m 192.168.178.65 free5gc-virtual-machine <none> <none>
kube-system coredns-6955765f44-s58rf 1/1 Running 1 4h20m 192.168.178.66 free5gc-virtual-machine <none> <none>
kube-system etcd-free5gc-virtual-machine 1/1 Running 1 4h21m 10.221.195.58 free5gc-virtual-machine <none> <none>
kube-system kube-apiserver-free5gc-virtual-machine 1/1 Running 1 4h21m 10.221.195.58 free5gc-virtual-machine <none> <none>
kube-system kube-controller-manager-free5gc-virtual-machine 1/1 Running 1 4h21m 10.221.195.58 free5gc-virtual-machine <none> <none>
kube-system kube-proxy-brvdg 1/1 Running 1 4h19m 10.221.194.166 master-node <none> <none>
kube-system kube-proxy-lfzjw 1/1 Running 1 4h20m 10.221.195.58 free5gc-virtual-machine <none> <none>
kube-system kube-scheduler-free5gc-virtual-machine 1/1 Running 1 4h21m 10.221.195.58 free5gc-virtual-machine <none> <none>
kube-system metrics-server-86c6d8b9bf-p2hh8 1/1 Running 0 2m43s 192.168.77.171 master-node <none> <none>
When I try to get the metrics I see the following:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache <unknown>/50% 1 10 1 3m58s
free5gc#free5gc-virtual-machine:~/Desktop/metrics-server/deploy$
free5gc#free5gc-virtual-machine:~/Desktop/metrics-server/deploy$
free5gc#free5gc-virtual-machine:~/Desktop/metrics-server/deploy$
free5gc#free5gc-virtual-machine:~/Desktop/metrics-server/deploy$ kubectl top nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
free5gc#free5gc-virtual-machine:~/Desktop/metrics-server/deploy$
free5gc#free5gc-virtual-machine:~/Desktop/metrics-server/deploy$
free5gc#free5gc-virtual-machine:~/Desktop/metrics-server/deploy$
free5gc#free5gc-virtual-machine:~/Desktop/metrics-server/deploy$ kubectl top pods --all-namespaces
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods.metrics.k8s.io)
Lastly, I see the log (v=6) the output of metrics-server:
free5gc#free5gc-virtual-machine:~/Desktop/metrics-server/deploy$ kubectl logs metrics-server-86c6d8b9bf-p2hh8 -n kube-system
I0206 18:16:18.657605 1 serving.go:273] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I0206 18:16:19.367356 1 round_trippers.go:405] GET https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication 200 OK in 7 milliseconds
I0206 18:16:19.370573 1 round_trippers.go:405] GET https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication 200 OK in 1 milliseconds
I0206 18:16:19.373245 1 round_trippers.go:405] GET https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication 200 OK in 1 milliseconds
I0206 18:16:19.375024 1 round_trippers.go:405] GET https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication 200 OK in 1 milliseconds
[restful] 2020/02/06 18:16:19 log.go:33: [restful/swagger] listing is available at https://:4443/swaggerapi
[restful] 2020/02/06 18:16:19 log.go:33: [restful/swagger] https://:4443/swaggerui/ is mapped to folder /swagger-ui/
I0206 18:16:19.421207 1 healthz.go:83] Installing healthz checkers:"ping", "poststarthook/generic-apiserver-start-informers", "healthz"
I0206 18:16:19.421641 1 serve.go:96] Serving securely on [::]:4443
I0206 18:16:19.421873 1 reflector.go:202] Starting reflector *v1.Pod (0s) from github.com/kubernetes-incubator/metrics-server/vendor/k8s.io/client-go/informers/factory.go:130
I0206 18:16:19.421891 1 reflector.go:240] Listing and watching *v1.Pod from github.com/kubernetes-incubator/metrics-server/vendor/k8s.io/client-go/informers/factory.go:130
I0206 18:16:19.421914 1 reflector.go:202] Starting reflector *v1.Node (0s) from github.com/kubernetes-incubator/metrics-server/vendor/k8s.io/client-go/informers/factory.go:130
I0206 18:16:19.421929 1 reflector.go:240] Listing and watching *v1.Node from github.com/kubernetes-incubator/metrics-server/vendor/k8s.io/client-go/informers/factory.go:130
I0206 18:16:19.423052 1 round_trippers.go:405] GET https://10.96.0.1:443/api/v1/nodes?limit=500&resourceVersion=0 200 OK in 1 milliseconds
I0206 18:16:19.424261 1 round_trippers.go:405] GET https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0 200 OK in 2 milliseconds
I0206 18:16:19.425586 1 round_trippers.go:405] GET https://10.96.0.1:443/api/v1/nodes?resourceVersion=38924&timeoutSeconds=481&watch=true 200 OK in 0 milliseconds
I0206 18:16:19.433545 1 round_trippers.go:405] GET https://10.96.0.1:443/api/v1/pods?resourceVersion=39246&timeoutSeconds=582&watch=true 200 OK in 0 milliseconds
I0206 18:16:49.388514 1 manager.go:99] Beginning cycle, collecting metrics...
I0206 18:16:49.388598 1 manager.go:95] Scraping metrics from 2 sources
I0206 18:16:49.395742 1 manager.go:120] Querying source: kubelet_summary:free5gc-virtual-machine
I0206 18:16:49.400574 1 manager.go:120] Querying source: kubelet_summary:master-node
I0206 18:16:49.413751 1 round_trippers.go:405] GET https://10.221.194.166:10250/stats/summary/ 200 OK in 13 milliseconds
I0206 18:16:49.414317 1 round_trippers.go:405] GET https://10.221.195.58:10250/stats/summary/ 200 OK in 18 milliseconds
I0206 18:16:49.417044 1 manager.go:150] ScrapeMetrics: time: 28.428677ms, nodes: 2, pods: 13
I0206 18:16:49.417062 1 manager.go:115] ...Storing metrics...
I0206 18:16:49.417083 1 manager.go:126] ...Cycle complete
free5gc#free5gc-virtual-machine:~/Desktop/metrics-server/deploy$ kubectl logs metrics-server-86c6d8b9bf-p2hh8 -n kube-system
I0206 18:16:18.657605 1 serving.go:273] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I0206 18:16:19.367356 1 round_trippers.go:405] GET https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication 200 OK in 7 milliseconds
I0206 18:16:19.370573 1 round_trippers.go:405] GET https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication 200 OK in 1 milliseconds
I0206 18:16:19.373245 1 round_trippers.go:405] GET https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication 200 OK in 1 milliseconds
I0206 18:16:19.375024 1 round_trippers.go:405] GET https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication 200 OK in 1 milliseconds
[restful] 2020/02/06 18:16:19 log.go:33: [restful/swagger] listing is available at https://:4443/swaggerapi
[restful] 2020/02/06 18:16:19 log.go:33: [restful/swagger] https://:4443/swaggerui/ is mapped to folder /swagger-ui/
I0206 18:16:19.421207 1 healthz.go:83] Installing healthz checkers:"ping", "poststarthook/generic-apiserver-start-informers", "healthz"
I0206 18:16:19.421641 1 serve.go:96] Serving securely on [::]:4443
I0206 18:16:19.421873 1 reflector.go:202] Starting reflector *v1.Pod (0s) from github.com/kubernetes-incubator/metrics-server/vendor/k8s.io/client-go/informers/factory.go:130
I0206 18:16:19.421891 1 reflector.go:240] Listing and watching *v1.Pod from github.com/kubernetes-incubator/metrics-server/vendor/k8s.io/client-go/informers/factory.go:130
I0206 18:16:19.421914 1 reflector.go:202] Starting reflector *v1.Node (0s) from github.com/kubernetes-incubator/metrics-server/vendor/k8s.io/client-go/informers/factory.go:130
I0206 18:16:19.421929 1 reflector.go:240] Listing and watching *v1.Node from github.com/kubernetes-incubator/metrics-server/vendor/k8s.io/client-go/informers/factory.go:130
I0206 18:16:19.423052 1 round_trippers.go:405] GET https://10.96.0.1:443/api/v1/nodes?limit=500&resourceVersion=0 200 OK in 1 milliseconds
I0206 18:16:19.424261 1 round_trippers.go:405] GET https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0 200 OK in 2 milliseconds
I0206 18:16:19.425586 1 round_trippers.go:405] GET https://10.96.0.1:443/api/v1/nodes?resourceVersion=38924&timeoutSeconds=481&watch=true 200 OK in 0 milliseconds
I0206 18:16:19.433545 1 round_trippers.go:405] GET https://10.96.0.1:443/api/v1/pods?resourceVersion=39246&timeoutSeconds=582&watch=true 200 OK in 0 milliseconds
I0206 18:16:49.388514 1 manager.go:99] Beginning cycle, collecting metrics...
I0206 18:16:49.388598 1 manager.go:95] Scraping metrics from 2 sources
I0206 18:16:49.395742 1 manager.go:120] Querying source: kubelet_summary:free5gc-virtual-machine
I0206 18:16:49.400574 1 manager.go:120] Querying source: kubelet_summary:master-node
I0206 18:16:49.413751 1 round_trippers.go:405] GET https://10.221.194.166:10250/stats/summary/ 200 OK in 13 milliseconds
I0206 18:16:49.414317 1 round_trippers.go:405] GET https://10.221.195.58:10250/stats/summary/ 200 OK in 18 milliseconds
I0206 18:16:49.417044 1 manager.go:150] ScrapeMetrics: time: 28.428677ms, nodes: 2, pods: 13
I0206 18:16:49.417062 1 manager.go:115] ...Storing metrics...
I0206 18:16:49.417083 1 manager.go:126] ...Cycle complete
Using the log output with v=10 I can see even the details of health of each pod, but nothing while running the kubectl get hpa or kubectl top nodes. Can someone give me a hint? Furthermore, my metrics manifest is:
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-server
namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: metrics-server
namespace: kube-system
labels:
k8s-app: metrics-server
spec:
selector:
matchLabels:
k8s-app: metrics-server
template:
metadata:
name: metrics-server
labels:
k8s-app: metrics-server
spec:
serviceAccountName: metrics-server
volumes:
# mount in tmp so we can safely use from-scratch images and/or read-only containers
- name: tmp-dir
emptyDir: {}
containers:
- name: metrics-server
image: k8s.gcr.io/metrics-server-amd64:v0.3.1
args:
- /metrics-server
- --metric-resolution=30s
- --requestheader-allowed-names=aggregator
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-insecure-tls
- --v=6
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
#- --kubelet-preferred-address-types=InternalIP
ports:
- name: main-port
containerPort: 4443
protocol: TCP
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
imagePullPolicy: Always
volumeMounts:
- name: tmp-dir
mountPath: /tmp
nodeSelector:
beta.kubernetes.io/os: linux
kubernetes.io/arch: "amd64"
And I can see the following:
free5gc#free5gc-virtual-machine:~/Desktop/metrics-server/deploy$ kubectl get apiservice v1beta1.metrics.k8s.io -o yaml
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
creationTimestamp: "2020-02-06T18:57:28Z"
name: v1beta1.metrics.k8s.io
resourceVersion: "45583"
selfLink: /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.metrics.k8s.io
uid: ca439221-b987-4c13-b0e0-8d2bb237e612
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
port: 443
version: v1beta1
versionPriority: 100
status:
conditions:
- lastTransitionTime: "2020-02-06T18:57:28Z"
message: 'failing or missing response from https://10.110.144.114:443/apis/metrics.k8s.io/v1beta1:
Get https://10.110.144.114:443/apis/metrics.k8s.io/v1beta1: dial tcp 10.110.144.114:443:
connect: no route to host'
reason: FailedDiscoveryCheck
status: "False"
type: Available
I have reproduced your issue (on Google Compute Engine). Tried a few scenarios to find workaround/solution for this issue.
First thing I want to mention is that you have provided ServiceAccount and Deployment YAML. You also need ClusterRoleBinding, RoleBinding, ApiService, etc. All needed YAMLs can be found in this Github repo.
For fast deploy metrics-server with all required config you can use:
$ git clone https://github.com/kubernetes-sigs/metrics-server.git
$ cd metrics-server/deploy/
$ kubectl apply -f kubernetes/
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
serviceaccount/metrics-server created
deployment.apps/metrics-server created
service/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
The second thing I would advise you to check your CNI pods (calico-node-d4rnh and calico-node-hawked). Created 4h11m but Ready 0/1.
Last thing regarding gathering CPU and Memory data from pods and nodes.
Using Calico
If you are using one-node kubeadm, it will work correctly, however, when you are using more than 1 node in kubeadm, this will cause some issues. There are many similar threads on Github regarding this. I've tried with various flags in args:, but no success. In metrics-server logs (-v=6) you will be able to see that metrics are gathering. In this Github thread, one of the Github users posted answer which is a workaround for this issue. It's also mentioned in K8s docs about hostNetwork.
Adding hostNetwork: true is what finally got metrics-server working for me. Without it, nada. Without the kubelet-preferred-address-types line, I could query my master node but not my two worker nodes, nor could I query pods, obviously undesirable results. Lack of kubelet-insecure-tls also results in an inoperable metrics-server installation.
spec:
hostNetwork: true
containers:
- args:
- --kubelet-insecure-tls
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP
- --v=6
image: k8s.gcr.io/metrics-server-amd64:v0.3.6
imagePullPolicy: Always
If you will deploy with this config it will work.
$ kubectl describe apiservice v1beta1.metrics.k8s.io
Name: v1beta1.metrics.k8s.io
...
Status:
Conditions:
Last Transition Time: 2020-02-20T09:37:59Z
Message: all checks passed
Reason: Passed
Status: True
Type: Available
Events: <none>
In addition, you can see the difference when using host network: true when you will check iptables. There is much more entries compare to deployment without this config.
After that, you can edit deployment, and remove or comment host network: true.
$ kubectl edit deploy metrics-server -n kube-system
deployment.apps/metrics-server edited
$ kubectl top pods
NAME CPU(cores) MEMORY(bytes)
nginx-6db489d4b7-2qhzw 0m 3Mi
nginx-6db489d4b7-9fvrj 0m 2Mi
nginx-6db489d4b7-dgbf9 0m 2Mi
nginx-6db489d4b7-dvcz5 0m 2Mi
Also, you will be able to find metrics using:
$ kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes
For better visibility you can use also jq.
$ kubectl get --raw /apis/metrics.k8s.io/v1beta1/pods | jq .
Using Weave Net
When you will use Weave Net and instead of Calico it will work without setting host network.
$ kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
However, you will need to work with certificates. But if you don't care about security, you can just use --kubelet-insecure-tls like in the previous example, when Calico was used.

kubernetes metrics server don't start

I try to connect in the dashboard of kubernetes.
I have the latest version of kubernetes v1.12 with kubeadm , in a server.
I download from github the metrics-server and run:
Kubctl create -f deploy/1.8+
but i get this error
kube-system metrics-server-5cbbc84f8c-tjfxd 0/1 Pending 0 12m
with out log to debug
error: the server doesn't have a resource type "logs"
I don't want to install heapster because is DEPRECATED.
UPDATE
Hello, and thanks.
i run the taint command i get:
error: at least one taint update is required
and the command
kubectl describe deployment metrics-server -n kube-system
i get this output:
Name: metrics-server
Namespace: kube-system
CreationTimestamp: Thu, 18 Oct 2018 14:34:42 +0000
Labels: k8s-app=metrics-server
Annotations: deployment.kubernetes.io/revision: 1
kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"extensions/v1beta1","kind":"Deployment","metadata": {"annotations":{},"labels":{"k8s-app":"metrics-server"},"name":"metrics-...
Selector: k8s-app=metrics-server
Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 1 max surge
Pod Template:
Labels: k8s-app=metrics-server
Service Account: metrics-server
Containers:
metrics-server:
Image: k8s.gcr.io/metrics-server-amd64:v0.3.1
Port: <none>
Host Port: <none>
Environment: <none>
Mounts:
/tmp from tmp-dir (rw)
Volumes:
tmp-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: metrics-server-5cbbc84f8c (1/1 replicas created)
Events: <none>
Command:
kubectl get nodes
The output for this is just the IP of the node, and nothing special.
Any ideas, or what to do to work the dashboard for kubernetes.
I suppose you are trying setup metrics-server on your master node.
If you issue kubectl describe deployment metrics-server -n kube-system I believe you will see something like this:
Name: metrics-server Namespace:
kube-system CreationTimestamp: Thu, 18 Oct 2018 15:57:34 +0000
Labels: k8s-app=metrics-server Annotations:
deployment.kubernetes.io/revision: 1 Selector:
k8s-app=metrics-server Replicas: 1 desired | 1 updated |
1 total | 0 available | 1 unavailable
But if you will describe your node you will see taint that prevent you from scheduling new pods on master node:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
kube-master-1 Ready master 17m v1.12.1
kubectl describe node kube-master-1
Name: kube-master-1
...
Taints: node-role.kubernetes.io/master:NoSchedule
You have to remove this taint:
kubectl taint node kube-master-1 node-role.kubernetes.io/master:NoSchedule-
node/kube-master-1 untainted
Result:
kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-node-xvc77 2/2 Running 0 20m
kube-system coredns-576cbf47c7-rj4wh 1/1 Running 0 21m
kube-system coredns-576cbf47c7-vsjsf 1/1 Running 0 21m
kube-system etcd-kube-master-1 1/1 Running 0 20m
kube-system kube-apiserver-kube-master-1 1/1 Running 0 20m
kube-system kube-controller-manager-kube-master-1 1/1 Running 0 20m
kube-system kube-proxy-xp5zh 1/1 Running 0 21m
kube-system kube-scheduler-kube-master-1 1/1 Running 0 20m
kube-system metrics-server-5cbbc84f8c-l2t76 1/1 Running 0 18m
But this is not the best approach. Good approach is to join worker and set up metrics-server there. There won't be any issues and there is no need to touch taint on master node.
Hope it will help you.
The above answer by "Vit" is correct, either remove taint from existing node group or create new node group without any taint.