kubernetes metrics server don't start - kubernetes

I try to connect in the dashboard of kubernetes.
I have the latest version of kubernetes v1.12 with kubeadm , in a server.
I download from github the metrics-server and run:
Kubctl create -f deploy/1.8+
but i get this error
kube-system metrics-server-5cbbc84f8c-tjfxd 0/1 Pending 0 12m
with out log to debug
error: the server doesn't have a resource type "logs"
I don't want to install heapster because is DEPRECATED.
UPDATE
Hello, and thanks.
i run the taint command i get:
error: at least one taint update is required
and the command
kubectl describe deployment metrics-server -n kube-system
i get this output:
Name: metrics-server
Namespace: kube-system
CreationTimestamp: Thu, 18 Oct 2018 14:34:42 +0000
Labels: k8s-app=metrics-server
Annotations: deployment.kubernetes.io/revision: 1
kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"extensions/v1beta1","kind":"Deployment","metadata": {"annotations":{},"labels":{"k8s-app":"metrics-server"},"name":"metrics-...
Selector: k8s-app=metrics-server
Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 1 max surge
Pod Template:
Labels: k8s-app=metrics-server
Service Account: metrics-server
Containers:
metrics-server:
Image: k8s.gcr.io/metrics-server-amd64:v0.3.1
Port: <none>
Host Port: <none>
Environment: <none>
Mounts:
/tmp from tmp-dir (rw)
Volumes:
tmp-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: metrics-server-5cbbc84f8c (1/1 replicas created)
Events: <none>
Command:
kubectl get nodes
The output for this is just the IP of the node, and nothing special.
Any ideas, or what to do to work the dashboard for kubernetes.

I suppose you are trying setup metrics-server on your master node.
If you issue kubectl describe deployment metrics-server -n kube-system I believe you will see something like this:
Name: metrics-server Namespace:
kube-system CreationTimestamp: Thu, 18 Oct 2018 15:57:34 +0000
Labels: k8s-app=metrics-server Annotations:
deployment.kubernetes.io/revision: 1 Selector:
k8s-app=metrics-server Replicas: 1 desired | 1 updated |
1 total | 0 available | 1 unavailable
But if you will describe your node you will see taint that prevent you from scheduling new pods on master node:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
kube-master-1 Ready master 17m v1.12.1
kubectl describe node kube-master-1
Name: kube-master-1
...
Taints: node-role.kubernetes.io/master:NoSchedule
You have to remove this taint:
kubectl taint node kube-master-1 node-role.kubernetes.io/master:NoSchedule-
node/kube-master-1 untainted
Result:
kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-node-xvc77 2/2 Running 0 20m
kube-system coredns-576cbf47c7-rj4wh 1/1 Running 0 21m
kube-system coredns-576cbf47c7-vsjsf 1/1 Running 0 21m
kube-system etcd-kube-master-1 1/1 Running 0 20m
kube-system kube-apiserver-kube-master-1 1/1 Running 0 20m
kube-system kube-controller-manager-kube-master-1 1/1 Running 0 20m
kube-system kube-proxy-xp5zh 1/1 Running 0 21m
kube-system kube-scheduler-kube-master-1 1/1 Running 0 20m
kube-system metrics-server-5cbbc84f8c-l2t76 1/1 Running 0 18m
But this is not the best approach. Good approach is to join worker and set up metrics-server there. There won't be any issues and there is no need to touch taint on master node.
Hope it will help you.

The above answer by "Vit" is correct, either remove taint from existing node group or create new node group without any taint.

Related

Kubernetes Dashboard CrashLoopBackOff: timeout error on Raspberry Pi cluster

Should be a simple task, I simply want to run the Kubernetes Dashboard on a clean install of Kubernetes on a Raspberry Pi cluster.
What I've done:
Setup the initial cluster (hostname, static ip, cgroup, swapspace, install and configure docker, install kubernetes, setup kubernetes network and join nodes)
I have flannel installed
I have applied the dashboard
Bunch of random testing trying to figure this out
Obviously, as seen below, the container in the dashboard pod is not working because it cannot access kubernetes-dashboard-csrf. I have no idea why this cannot be accessed, my only thought is that I missed a step when setting up the cluster. I've followed about 6 different guides without success, prioritizing the official guide. I have also seen quite a few people having the same or similar issues that most have not posted a resolution. Thanks!
Nodes: kubectl get nodes
NAME STATUS ROLES AGE VERSION
gus3 Ready <none> 346d v1.23.1
juliet3 Ready <none> 346d v1.23.1
shawn4 Ready <none> 346d v1.23.1
vick4 Ready control-plane,master 346d v1.23.1
All Pods: kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-74ff55c5b-7j2xg 1/1 Running 27 346d
kube-system coredns-74ff55c5b-cb2x8 1/1 Running 27 346d
kube-system etcd-vick4 1/1 Running 2 169m
kube-system kube-apiserver-vick4 1/1 Running 2 169m
kube-system kube-controller-manager-vick4 1/1 Running 2 169m
kube-system kube-flannel-ds-gclmp 1/1 Running 0 11m
kube-system kube-flannel-ds-hshjv 1/1 Running 0 12m
kube-system kube-flannel-ds-kdd4w 1/1 Running 0 11m
kube-system kube-flannel-ds-wzhkt 1/1 Running 0 10m
kube-system kube-proxy-4t25v 1/1 Running 26 346d
kube-system kube-proxy-b6vbx 1/1 Running 26 346d
kube-system kube-proxy-jgj4s 1/1 Running 27 346d
kube-system kube-proxy-n65sl 1/1 Running 26 346d
kube-system kube-scheduler-vick4 1/1 Running 2 169m
kubernetes-dashboard dashboard-metrics-scraper-5b8896d7fc-99wfk 1/1 Running 0 77m
kubernetes-dashboard kubernetes-dashboard-897c7599f-qss5p 0/1 CrashLoopBackOff 18 77m
Resources: kubectl get all -n kubernetes-dashboard
NAME READY STATUS RESTARTS AGE
pod/dashboard-metrics-scraper-5b8896d7fc-99wfk 1/1 Running 0 79m
pod/kubernetes-dashboard-897c7599f-qss5p 0/1 CrashLoopBackOff 19 79m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/dashboard-metrics-scraper ClusterIP 172.20.0.191 <none> 8000/TCP 79m
service/kubernetes-dashboard ClusterIP 172.20.0.15 <none> 443/TCP 79m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/dashboard-metrics-scraper 1/1 1 1 79m
deployment.apps/kubernetes-dashboard 0/1 1 0 79m
NAME DESIRED CURRENT READY AGE
replicaset.apps/dashboard-metrics-scraper-5b8896d7fc 1 1 1 79m
replicaset.apps/kubernetes-dashboard-897c7599f 1 1 0 79m
Notice CrashLoopBackOff
Pod Details: kubectl describe pods kubernetes-dashboard-897c7599f-qss5p -n kubernetes-dashboard
Name: kubernetes-dashboard-897c7599f-qss5p
Namespace: kubernetes-dashboard
Priority: 0
Node: shawn4/192.168.10.71
Start Time: Fri, 17 Dec 2021 18:52:15 +0000
Labels: k8s-app=kubernetes-dashboard
pod-template-hash=897c7599f
Annotations: <none>
Status: Running
IP: 172.19.1.75
IPs:
IP: 172.19.1.75
Controlled By: ReplicaSet/kubernetes-dashboard-897c7599f
Containers:
kubernetes-dashboard:
Container ID: docker://894a354e40ca1a95885e149dcd75415e0f186ead3f2e05ec0787f4b1c7a29622
Image: kubernetesui/dashboard:v2.4.0
Image ID: docker-pullable://kubernetesui/dashboard#sha256:526850ae4ea9aba360e72b6df69fd3126b129d446efe83ac5250282b85f95b7f
Port: 8443/TCP
Host Port: 0/TCP
Args:
--auto-generate-certificates
--namespace=kubernetes-dashboard
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Fri, 17 Dec 2021 20:10:19 +0000
Finished: Fri, 17 Dec 2021 20:10:49 +0000
Ready: False
Restart Count: 19
Liveness: http-get https://:8443/ delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/certs from kubernetes-dashboard-certs (rw)
/tmp from tmp-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kubernetes-dashboard-token-wq9m8 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kubernetes-dashboard-certs:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-certs
Optional: false
tmp-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kubernetes-dashboard-token-wq9m8:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-token-wq9m8
Optional: false
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 21s (x327 over 79m) kubelet Back-off restarting failed container
Logs: kubectl logs -f -n kubernetes-dashboard kubernetes-dashboard-897c7599f-qss5p
2021/12/17 20:10:19 Starting overwatch
2021/12/17 20:10:19 Using namespace: kubernetes-dashboard
2021/12/17 20:10:19 Using in-cluster config to connect to apiserver
2021/12/17 20:10:19 Using secret token for csrf signing
2021/12/17 20:10:19 Initializing csrf token from kubernetes-dashboard-csrf secret
panic: Get "https://172.20.0.1:443/api/v1/namespaces/kubernetes-dashboard/secrets/kubernetes-dashboard-csrf": dial tcp 172.20.0.1:443: i/o timeout
goroutine 1 [running]:
github.com/kubernetes/dashboard/src/app/backend/client/csrf.(*csrfTokenManager).init(0x400055fae8)
/home/runner/work/dashboard/dashboard/src/app/backend/client/csrf/manager.go:41 +0x350
github.com/kubernetes/dashboard/src/app/backend/client/csrf.NewCsrfTokenManager(...)
/home/runner/work/dashboard/dashboard/src/app/backend/client/csrf/manager.go:66
github.com/kubernetes/dashboard/src/app/backend/client.(*clientManager).initCSRFKey(0x40001fc080)
/home/runner/work/dashboard/dashboard/src/app/backend/client/manager.go:502 +0x8c
github.com/kubernetes/dashboard/src/app/backend/client.(*clientManager).init(0x40001fc080)
/home/runner/work/dashboard/dashboard/src/app/backend/client/manager.go:470 +0x40
github.com/kubernetes/dashboard/src/app/backend/client.NewClientManager(...)
/home/runner/work/dashboard/dashboard/src/app/backend/client/manager.go:551
main.main()
/home/runner/work/dashboard/dashboard/src/app/backend/dashboard.go:95 +0x1dc
If you need any more information please ask!
UPDATE 12/29/21:
Fixed this issue by reinstalling the cluster to the newest versions of Kubernetes and Ubuntu.
Turned out there were several issues:
I was using Ubuntu Buster which is deprecated.
My client/server Kubernetes versions were +/-0.3 out of sync
I was following outdated instructions
I reinstalled the whole cluster following Kubernetes official guide and, with a few snags along the way, it works!

Kubernetes daemonset creating two pods instead of one (expected)

I have the following local 2-node kubernetes cluster-
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
srl1 Ready control-plane,master 2d18h v1.21.2 xxx.xxx.12.58 <none> Ubuntu 20.04.2 LTS 5.4.0-80-generic docker://20.10.7
srl2 Ready <none> 2d18h v1.21.3 xxx.xxx.80.72 <none> Ubuntu 18.04.2 LTS 5.4.0-80-generic docker://20.10.2
I am trying to deploy an application on using a cluster creation python scirpt(https://github.com/hydro-project/cluster/blob/master/hydro/cluster/create_cluster.py)
When it tries to create a routing node using apps_client.create_namespaced_daemon_set(namespace=util.NAMESPACE, body=yml) it is expected that it should create a single pod from the routing-ds.yaml (given below) file and assign it to the routing daemonset (kind). However as you can see, it is creating two routing pods instead of one on every physical node. (FYI-my master can schedule pod)
akazad#srl1:~/hydro-project/cluster$ kubectl get all -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default pod/management-pod 1/1 Running 0 25m 192.168.190.77 srl2 <none> <none>
default pod/monitoring-pod 1/1 Running 0 25m 192.168.120.71 srl1 <none> <none>
default pod/routing-nodes-9q7dr 1/1 Running 0 24m xxx.xxx.12.58 srl1 <none> <none>
default pod/routing-nodes-kfbnv 1/1 Running 0 24m xxx.xxx.80.72 srl2 <none> <none>
kube-system pod/calico-kube-controllers-7676785684-tpz7q 1/1 Running 0 2d19h 192.168.120.65 srl1 <none> <none>
kube-system pod/calico-node-lnxtb 1/1 Running 0 2d19h xxx.xxx.12.58 srl1 <none> <none>
kube-system pod/calico-node-mdvpd 1/1 Running 0 2d19h xxx.xxx.80.72 srl2 <none> <none>
kube-system pod/coredns-558bd4d5db-vfghf 1/1 Running 0 2d19h 192.168.120.66 srl1 <none> <none>
kube-system pod/coredns-558bd4d5db-x7jhj 1/1 Running 0 2d19h xxx.xxx.120.67 srl1 <none> <none>
kube-system pod/etcd-srl1 1/1 Running 0 2d19h xxx.xxx.12.58 srl1 <none> <none>
kube-system pod/kube-apiserver-srl1 1/1 Running 0 2d19h xxx.xxx.12.58 srl1 <none> <none>
kube-system pod/kube-controller-manager-srl1 1/1 Running 0 2d19h xxx.xxx.12.58 srl1 <none> <none>
kube-system pod/kube-proxy-l8fds 1/1 Running 0 2d19h xxx.xxx.12.58 srl1 <none> <none>
kube-system pod/kube-proxy-szrng 1/1 Running 0 2d19h xxx.xxx.80.72 srl2 <none> <none>
kube-system pod/kube-scheduler-srl1 1/1 Running 0 2d19h xxx.xxx.12.58 srl1 <none> <none>
metallb-system pod/controller-6b78bff7d9-t7gjr 1/1 Running 0 2d19h 192.168.190.65 srl2 <none> <none>
metallb-system pod/speaker-qsqnc 1/1 Running 0 2d19h xxx.xxx.12.58 srl1 <none> <none>
metallb-system pod/speaker-s4pp8 1/1 Running 0 2d19h xxx.xxx.80.72 srl2 <none> <none>
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 2d19h <none>
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 2d19h k8s-app=kube-dns
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
default daemonset.apps/routing-nodes 2 2 2 2 2 <none> 24m routing-container akazad1/srlanna:v2 role=routing
kube-system daemonset.apps/calico-node 2 2 2 2 2 kubernetes.io/os=linux 2d19h calico-node calico/node:v3.14.2 k8s-app=calico-node
kube-system daemonset.apps/kube-proxy 2 2 2 2 2 kubernetes.io/os=linux 2d19h kube-proxy k8s.gcr.io/kube-proxy:v1.21.3 k8s-app=kube-proxy
metallb-system daemonset.apps/speaker 2 2 2 2 2 kubernetes.io/os=linux 2d19h speaker quay.io/metallb/speaker:v0.10.2 app=metallb,component=speaker
However, when it is directly creating pods from the management-pod.yaml (given below), it is creating one as expected.
Why the dasemonset is creating two pods instead of one?
Code segment where it is supposed to create a daemonset of type routing node
for i in range(len(kinds)):
kind = kinds[i]
# Create should only be true when the DaemonSet is being created for the
# first time -- i.e., when this is called from create_cluster. After that,
# we can basically ignore this because the DaemonSet will take care of
# adding pods to created nodes.
if create:
fname = 'yaml/ds/%s-ds.yml' % kind
yml = util.load_yaml(fname, prefix)
for container in yml['spec']['template']['spec']['containers']:
env = container['env']
util.replace_yaml_val(env, 'ROUTING_IPS', route_str)
util.replace_yaml_val(env, 'ROUTE_ADDR', route_addr)
util.replace_yaml_val(env, 'SCHED_IPS', sched_str)
util.replace_yaml_val(env, 'FUNCTION_ADDR', function_addr)
util.replace_yaml_val(env, 'MON_IPS', mon_str)
util.replace_yaml_val(env, 'MGMT_IP', management_ip)
util.replace_yaml_val(env, 'SEED_IP', seed_ip)
apps_client.create_namespaced_daemon_set(namespace=util.NAMESPACE,
body=yml)
# Wait until all pods of this kind are running
res = []
while len(res) != expected_counts[i]:
res = util.get_pod_ips(client, 'role='+kind, is_running=True)
pods = client.list_namespaced_pod(namespace=util.NAMESPACE,
label_selector='role=' +
kind).items
created_pods = get_current_pod_container_pairs(pods)
I have removed the nodeSelector from all the yaml files as I am running it on bare-metal cluster.
1 routing-ds.yaml
14
15 apiVersion: apps/v1
16 kind: DaemonSet
17 metadata:
18 name: routing-nodes
19 labels:
20 role: routing
21 spec:
22 selector:
23 matchLabels:
24 role: routing
25 template:
26 metadata:
27 labels:
28 role: routing
29 spec:
30 #nodeSelector:
31 # role: routing
32
33 hostNetwork: true
34 containers:
35 - name: routing-container
36 image: akazad1/srlanna:v2
37 env:
38 - name: SERVER_TYPE
39 value: r
40 - name: MON_IPS
41 value: MON_IPS_DUMMY
42 - name: REPO_ORG
43 value: hydro-project
44 - name: REPO_BRANCH
45 value: master
2 management-pod.yaml
15 apiVersion: v1
16 kind: Pod
17 metadata:
18 name: management-pod
19 labels:
20 role: management
21 spec:
22 restartPolicy: Never
23 containers:
24 - name: management-container
25 image: hydroproject/management
26 env:
27 #- name: AWS_ACCESS_KEY_ID
28 #value: ACCESS_KEY_ID_DUMMY
29 #- name: AWS_SECRET_ACCESS_KEY
30 #value: SECRET_KEY_DUMMY
31 #- name: KOPS_STATE_STORE
32 # value: KOPS_BUCKET_DUMMY
33 - name: HYDRO_CLUSTER_NAME
34 value: CLUSTER_NAME
35 - name: REPO_ORG
36 value: hydro-project
37 - name: REPO_BRANCH
38 value: master
39 - name: ANNA_REPO_ORG
40 value: hydro-project
41 - name: ANNA_REPO_BRANCH
42 value: master
43 # nodeSelector:
44 #role: general
May you have misunderstanding you have to use the kind: deployment if you want to manage the replicas (PODs - 1,2,3...n) on Kubernetes.
Daemon set behavior is like it will run the POD on each available node in the cluster.
So inside your cluster, there are two nodes so daemon set will run the POD on each available node. If you will increase the Node deamon set will auto-create the POD on that node also.
kind: Pod
will create the single POD only which is its default behavior.
The following are some of the Kubernetes Objects:
pods
ReplicationController (Manages Pods)
Deployment (Manages Pods)
StatefulSets
DaemonSets
You can read more at : https://chkrishna.medium.com/kubernetes-objects-e0a8b93b5cdc
Official document : https://kubernetes.io/docs/concepts/overview/working-with-objects/kubernetes-objects/
If you want to manage POD using any type of controller kind: deployment is best. you can scale the replicas and scale down. You can also mention the replicas in YAML 1,2,3 and that way it will be running on cluster based on number.

Argo sample workflows stuck in the pending state

I follow the Argo Workflow's Getting Started documentation. Everything goes smooth until I run the first sample workflow as described in 4. Run Sample Workflows. The workflow just gets stuck in the pending state:
vagrant#master:~$ argo submit --watch https://raw.githubusercontent.com/argoproj/argo/master/examples/hello-world.yaml
Name: hello-world-z4lbs
Namespace: default
ServiceAccount: default
Status: Pending
Created: Thu May 14 12:36:45 +0000 (now)
vagrant#master:~$ argo list
NAME STATUS AGE DURATION PRIORITY
hello-world-z4lbs Pending 27m 0s 0
Here it was mentioned that taints on the muster node may be the problem, so I untainted the master node:
vagrant#master:~$ kubectl taint nodes --all node-role.kubernetes.io/master-
node/master untainted
taint "node-role.kubernetes.io/master" not found
taint "node-role.kubernetes.io/master" not found
Then I deleted the pending workflow and resubmitted it, but it got stuck in the pending state again.
The details of the newly submitted workflow that is also stuck:
vagrant#master:~$ kubectl describe workflow hello-world-8kvmb
Name: hello-world-8kvmb
Namespace: default
Labels: <none>
Annotations: <none>
API Version: argoproj.io/v1alpha1
Kind: Workflow
Metadata:
Creation Timestamp: 2020-05-14T13:57:44Z
Generate Name: hello-world-
Generation: 1
Managed Fields:
API Version: argoproj.io/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:generateName:
f:spec:
.:
f:arguments:
f:entrypoint:
f:templates:
f:status:
.:
f:finishedAt:
f:startedAt:
Manager: argo
Operation: Update
Time: 2020-05-14T13:57:44Z
Resource Version: 16780
Self Link: /apis/argoproj.io/v1alpha1/namespaces/default/workflows/hello-world-8kvmb
UID: aa82d005-b7ac-411f-9d0b-93f34876b673
Spec:
Arguments:
Entrypoint: whalesay
Templates:
Arguments:
Container:
Args:
hello world
Command:
cowsay
Image: docker/whalesay:latest
Name:
Resources:
Inputs:
Metadata:
Name: whalesay
Outputs:
Status:
Finished At: <nil>
Started At: <nil>
Events: <none>
While trying to get the workflow-controller logs I get the follwoing error:
vagrant#master:~$ kubectl logs -n argo -l app=workflow-controller
Error from server (BadRequest): container "workflow-controller" in pod "workflow-controller-6c4787844c-lbksm" is waiting to start: ContainerCreating
The details for the corresponding workflow-controller pod:
vagrant#master:~$ kubectl -n argo describe pods/workflow-controller-6c4787844c-lbksm
Name: workflow-controller-6c4787844c-lbksm
Namespace: argo
Priority: 0
Node: node-1/192.168.50.11
Start Time: Thu, 14 May 2020 12:08:29 +0000
Labels: app=workflow-controller
pod-template-hash=6c4787844c
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/workflow-controller-6c4787844c
Containers:
workflow-controller:
Container ID:
Image: argoproj/workflow-controller:v2.8.0
Image ID:
Port: <none>
Host Port: <none>
Command:
workflow-controller
Args:
--configmap
workflow-controller-configmap
--executor-image
argoproj/argoexec:v2.8.0
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from argo-token-pz4fd (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
argo-token-pz4fd:
Type: Secret (a volume populated by a Secret)
SecretName: argo-token-pz4fd
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SandboxChanged 7m17s (x4739 over 112m) kubelet, node-1 Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 2m18s (x4950 over 112m) kubelet, node-1 (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "1bd1fd11dfe677c749b4a1260c29c2f8cff0d55de113d154a822e68b41f9438e" network for pod "workflow-controller-6c4787844c-lbksm": networkPlugin cni failed to set up pod "workflow-controller-6c4787844c-lbksm_argo" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
I run Argo 2.8:
vagrant#master:~$ argo version
argo: v2.8.0
BuildDate: 2020-05-11T22:55:16Z
GitCommit: 8f696174746ed01b9bf1941ad03da62d312df641
GitTreeState: clean
GitTag: v2.8.0
GoVersion: go1.13.4
Compiler: gc
Platform: linux/amd64
I have checked the cluster status and it looks OK:
vagrant#master:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready master 95m v1.18.2
node-1 Ready <none> 92m v1.18.2
node-2 Ready <none> 92m v1.18.2
As to the K8s cluster installation, I created it using Vagrant as described here, the only differences being:
libvirt as provdier
newer version of Ubuntu: generic/ubuntu1804
newer version of Calico: v3.14
Any idea why the workflows get stuck in the pending state and how to fix it?
Workflows start in the Pending state and then are moved through their steps by the workflow-controller pod (which is installed in the cluster as part of Argo).
The workflow-controller pod is stuck in ContainerCreating. kubectl describe po {workflow-controller pod} reveals a Calico-related network error.
As mentioned in the comments, it looks like a common Calico error. Once you clear that up, your hello-world workflow should execute just fine.
Note from OP: Further debugging confirms the Calico problem (Calico nodes are not in the running state):
vagrant#master:~$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
argo argo-server-84946785b-94bfs 0/1 ContainerCreating 0 3h59m
argo workflow-controller-6c4787844c-lbksm 0/1 ContainerCreating 0 3h59m
kube-system calico-kube-controllers-74d45555dd-zhkp6 0/1 CrashLoopBackOff 56 3h59m
kube-system calico-node-2n9kt 0/1 CrashLoopBackOff 72 3h59m
kube-system calico-node-b8sb8 0/1 Running 70 3h56m
kube-system calico-node-pslzs 0/1 CrashLoopBackOff 67 3h56m
kube-system coredns-66bff467f8-rmxsp 0/1 ContainerCreating 0 3h59m
kube-system coredns-66bff467f8-z4lbq 0/1 ContainerCreating 0 3h59m
kube-system etcd-master 1/1 Running 2 3h59m
kube-system kube-apiserver-master 1/1 Running 2 3h59m
kube-system kube-controller-manager-master 1/1 Running 2 3h59m
kube-system kube-proxy-k59ks 1/1 Running 2 3h59m
kube-system kube-proxy-mn96x 1/1 Running 1 3h56m
kube-system kube-proxy-vxj8b 1/1 Running 1 3h56m
kube-system kube-scheduler-master 1/1 Running 2 3h59m
For the calico CrashLoopBackOff, kubeadm use the default interface eth0 to bootstrap the cluster.
But the eth0 interface is used by Vagrant (for ssh).
You could configure the kubelet to use a private IP address (for instance) and not eth0.
You'll have to do that for each node then vagrant reload.
sudo vim /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
#Add the Environment line in 10-kubeadm.conf and replace your_node_ip
Environment="KUBELET_EXTRA_ARGS=--node-ip=your_node_ip"
Hope it helps

Horizontal Pod Autoscaler replicas based on the amount of the nodes in the cluster

Im looking for the solution that will scale out pods automatically when the nodes join the cluster and scale in back when the nodes are deleted.
We are running WebApp on the nodes and this require graceful pod eviction/termination when the node is scheduled to be disconnected.
I was checking the option of using the DaemonSet but since we are using Kops for the cluster rolling update it ignores DaemonSets evictions (flag "--ignore-daemionset" is not supported).
As a result the WebApp "dies" with the node which is not acceptable for our application.
The ability of HorizontalPodAutoscaler to overwrite the amount of replicas which are set in the deployment yaml could solve the problem.
I want to find the way to change the min/maxReplicas in HorizontalPodAutoscaler yaml dynamically based on the amount of nodes in the cluster.
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: MyWebApp
minReplicas: "Num of nodes in the cluster"
maxReplicas: "Num of nodes in the cluster"
Any ideas how to get the number of nodes and update HorizontalPodAutoscaler yaml in the cluster accordingly? Or any other solutions for the problem?
Have you tried usage of nodeSelector spec in daemonset yaml.
So if you have nodeselector set in yaml and just before drain if you remove the nodeselector label value from the node the daemonset should scale down gracefully also same when you add new node to cluster label the node with custom value and deamonset will scale up.
This works for me so you can try this and confirm with Kops
First : Label all you nodes with a custom label you will always have on your cluster
Example:
kubectl label nodes k8s-master-1 mylabel=allow_demon_set
kubectl label nodes k8s-node-1 mylabel=allow_demon_set
kubectl label nodes k8s-node-2 mylabel=allow_demon_set
kubectl label nodes k8s-node-3 mylabel=allow_demon_set
Then to your daemon set yaml add node selector.
Example.yaml used as below : Note added nodeselctor field
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd-elasticsearch
labels:
k8s-app: fluentd-logging
spec:
selector:
matchLabels:
name: fluentd-elasticsearch
template:
metadata:
labels:
name: fluentd-elasticsearch
spec:
nodeSelector:
mylabel: allow_demon_set
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: fluentd-elasticsearch
image: quay.io/fluentd_elasticsearch/fluentd:v2.5.2
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
terminationGracePeriodSeconds: 30
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
So nodes are labeled as below
$ kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
k8s-master-1 Ready master 9d v1.17.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master-1,kubernetes.io/os=linux,mylable=allow_demon_set,node-role.kubernetes.io/master=
k8s-node-1 Ready <none> 9d v1.17.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-1,kubernetes.io/os=linux,mylable=allow_demon_set
k8s-node-2 Ready <none> 9d v1.17.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-2,kubernetes.io/os=linux,mylable=allow_demon_set
k8s-node-3 Ready <none> 9d v1.17.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-3,kubernetes.io/os=linux,mylable=allow_demon_set
Once you have correct yaml start the daemon set using it
$ kubectl create -f Example.yaml
$ kubectl get all -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/fluentd-elasticsearch-jrgl6 1/1 Running 0 20s 10.244.3.19 k8s-node-3 <none> <none>
pod/fluentd-elasticsearch-rgcm2 1/1 Running 0 20s 10.244.0.6 k8s-master-1 <none> <none>
pod/fluentd-elasticsearch-wccr9 1/1 Running 0 20s 10.244.1.14 k8s-node-1 <none> <none>
pod/fluentd-elasticsearch-wxq5v 1/1 Running 0 20s 10.244.2.33 k8s-node-2 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 9d <none>
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
daemonset.apps/fluentd-elasticsearch 4 4 4 4 4 mylable=allow_demon_set 20s fluentd-elasticsearch quay.io/fluentd_elasticsearch/fluentd:v2.5.2 name=fluentd-elasticsearch
Then before draining a node we can just remove the custom label from node and the pod-should scale down gracefully and then drain the node.
$ kubectl label nodes k8s-node-3 mylabel-
Check the daemonset and it should scale down
ubuntu#k8s-kube-client:~$ kubectl get all -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/fluentd-elasticsearch-jrgl6 0/1 Terminating 0 2m36s 10.244.3.19 k8s-node-3 <none> <none>
pod/fluentd-elasticsearch-rgcm2 1/1 Running 0 2m36s 10.244.0.6 k8s-master-1 <none> <none>
pod/fluentd-elasticsearch-wccr9 1/1 Running 0 2m36s 10.244.1.14 k8s-node-1 <none> <none>
pod/fluentd-elasticsearch-wxq5v 1/1 Running 0 2m36s 10.244.2.33 k8s-node-2 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 9d <none>
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
daemonset.apps/fluentd-elasticsearch 3 3 3 3 3 mylable=allow_demon_set 2m36s fluentd-elasticsearch quay.io/fluentd_elasticsearch/fluentd:v2.5.2 name=fluentd-elasticsearch
Now again add the label to new node with same custom label when it is added to cluster and the deamonset will scale up
$ kubectl label nodes k8s-node-3 mylable=allow_demon_set
ubuntu#k8s-kube-client:~$ kubectl get all -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/fluentd-elasticsearch-22rsj 1/1 Running 0 2s 10.244.3.20 k8s-node-3 <none> <none>
pod/fluentd-elasticsearch-rgcm2 1/1 Running 0 5m28s 10.244.0.6 k8s-master-1 <none> <none>
pod/fluentd-elasticsearch-wccr9 1/1 Running 0 5m28s 10.244.1.14 k8s-node-1 <none> <none>
pod/fluentd-elasticsearch-wxq5v 1/1 Running 0 5m28s 10.244.2.33 k8s-node-2 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 9d <none>
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
daemonset.apps/fluentd-elasticsearch 4 4 4 4 4 mylable=allow_demon_set 5m28s fluentd-elasticsearch quay.io/fluentd_elasticsearch/fluentd:v2.5.2 name=fluentd-elasticsearch
Kindly confirm if this what you want to do and works with kops

kubernetes cluster master node not ready

i do not know why ,my master node in not ready status,all pods on cluster run normally, and i use cabernets v1.7.5 ,and network plugin use calico,and os version is "centos7.2.1511"
# kubectl get nodes
NAME STATUS AGE VERSION
k8s-node1 Ready 1h v1.7.5
k8s-node2 NotReady 1h v1.7.5
# kubectl get all --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system po/calico-node-11kvm 2/2 Running 0 33m
kube-system po/calico-policy-controller-1906845835-1nqjj 1/1 Running 0 33m
kube-system po/calicoctl 1/1 Running 0 33m
kube-system po/etcd-k8s-node2 1/1 Running 1 15m
kube-system po/kube-apiserver-k8s-node2 1/1 Running 1 15m
kube-system po/kube-controller-manager-k8s-node2 1/1 Running 2 15m
kube-system po/kube-dns-2425271678-2mh46 3/3 Running 0 1h
kube-system po/kube-proxy-qlmbx 1/1 Running 1 1h
kube-system po/kube-proxy-vwh6l 1/1 Running 0 1h
kube-system po/kube-scheduler-k8s-node2 1/1 Running 2 15m
NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default svc/kubernetes 10.96.0.1 <none> 443/TCP 1h
kube-system svc/kube-dns 10.96.0.10 <none> 53/UDP,53/TCP 1h
NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
kube-system deploy/calico-policy-controller 1 1 1 1 33m
kube-system deploy/kube-dns 1 1 1 1 1h
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system rs/calico-policy-controller-1906845835 1 1 1 33m
kube-system rs/kube-dns-2425271678 1 1 1 1h
update
it seems master node can not recognize the calico network plugin, i use kubeadm to install k8s cluster ,due to kubeadm start etcd on 127.0.0.1:2379 on master node,and calico on other nodes can not talk with etcd,so i modify etcd.yaml as following ,and all calico pods run fine, i do not very familiar with calico ,how to fix it ?
apiVersion: v1
kind: Pod
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
creationTimestamp: null
labels:
component: etcd
tier: control-plane
name: etcd
namespace: kube-system
spec:
containers:
- command:
- etcd
- --listen-client-urls=http://127.0.0.1:2379,http://10.161.233.80:2379
- --advertise-client-urls=http://10.161.233.80:2379
- --data-dir=/var/lib/etcd
image: gcr.io/google_containers/etcd-amd64:3.0.17
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /health
port: 2379
scheme: HTTP
initialDelaySeconds: 15
timeoutSeconds: 15
name: etcd
resources: {}
volumeMounts:
- mountPath: /etc/ssl/certs
name: certs
- mountPath: /var/lib/etcd
name: etcd
- mountPath: /etc/kubernetes
name: k8s
readOnly: true
hostNetwork: true
volumes:
- hostPath:
path: /etc/ssl/certs
name: certs
- hostPath:
path: /var/lib/etcd
name: etcd
- hostPath:
path: /etc/kubernetes
name: k8s
status: {}
[root#k8s-node2 calico]# kubectl describe node k8s-node2
Name: k8s-node2
Role:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=k8s-node2
node-role.kubernetes.io/master=
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: node-role.kubernetes.io/master:NoSchedule
CreationTimestamp: Tue, 12 Sep 2017 15:20:57 +0800
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Wed, 13 Sep 2017 10:25:58 +0800 Tue, 12 Sep 2017 15:20:57 +0800 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Wed, 13 Sep 2017 10:25:58 +0800 Tue, 12 Sep 2017 15:20:57 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 13 Sep 2017 10:25:58 +0800 Tue, 12 Sep 2017 15:20:57 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready False Wed, 13 Sep 2017 10:25:58 +0800 Tue, 12 Sep 2017 15:20:57 +0800 KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Addresses:
InternalIP: 10.161.233.80
Hostname: k8s-node2
Capacity:
cpu: 2
memory: 3618520Ki
pods: 110
Allocatable:
cpu: 2
memory: 3516120Ki
pods: 110
System Info:
Machine ID: 3c6ff97c6fbe4598b53fd04e08937468
System UUID: C6238BF8-8E60-4331-AEEA-6D0BA9106344
Boot ID: 84397607-908f-4ff8-8bdc-ff86c364dd32
Kernel Version: 3.10.0-514.6.2.el7.x86_64
OS Image: CentOS Linux 7 (Core)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.12.6
Kubelet Version: v1.7.5
Kube-Proxy Version: v1.7.5
PodCIDR: 10.68.0.0/24
ExternalID: k8s-node2
Non-terminated Pods: (5 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
kube-system etcd-k8s-node2 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-apiserver-k8s-node2 250m (12%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-controller-manager-k8s-node2 200m (10%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-proxy-qlmbx 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-scheduler-k8s-node2 100m (5%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
550m (27%) 0 (0%) 0 (0%) 0 (0%)
Events: <none>
It's good practice to run a describe command in order to see what's wrong with your node:
kubectl describe nodes <NODE_NAME>
e.g.: kubectl describe nodes k8s-node2
You should be able to start your investigations from there and add more info to this question if needed.
You need install a Network Policy Provider, this is one of supported provider:
Weave Net for NetworkPolicy.
command line to install:
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
After a few seconds, a Weave Net pod should be running on each Node and any further pods you create will be automatically attached to the Weave network.
I think you may need to add tolerations and update the annotations for calico-node in the manifest you are using so that it can run on a master created by kubeadm. Kubeadm taints the master so that pods cannot run on it unless they have a toleration for that taint.
I believe you are using the https://docs.projectcalico.org/v2.5/getting-started/kubernetes/installation/hosted/calico.yaml manifest which has the annotations (that include tolerations) for K8s v1.5, you should check https://docs.projectcalico.org/v2.5/getting-started/kubernetes/installation/hosted/kubeadm/1.6/calico.yaml, it has the toleration syntax for K8s v1.6+.
Here is a snippet from the above with annotations and tolerations
metadata:
labels:
k8s-app: calico-node
annotations:
# Mark this pod as a critical add-on; when enabled, the critical add-on scheduler
# reserves resources for critical add-on pods so that they can be rescheduled after
# a failure. This annotation works in tandem with the toleration below.
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
hostNetwork: true
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
# Allow this pod to be rescheduled while the node is in "critical add-ons only" mode.
# This, along with the annotation above marks this pod as a critical add-on.
- key: CriticalAddonsOnly
operator: Exists