We deployed new Kubernetes cluster, and it has 2 pods for Coredns.
$ kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME READY STATUS RESTARTS AGE
coredns-74ff55c5b-7v9bd 0/1 Running 0 7h22m
coredns-74ff55c5b-tfpqb 0/1 Running 0 7h23m
There suppose to be 2 replicas, but 0 READY.
When I check the logs to find the reason for not running, I see there are many Unauthorized errors.
$ for p in $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name); do kubectl logs --tail 20 --namespace=kube-system $p; done
E0323 00:58:04.393710 1 reflector.go:178] pkg/mod/k8s.io/client-go#v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Namespace: Unauthorized
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
E0323 00:58:34.184217 1 reflector.go:178] pkg/mod/k8s.io/client-go#v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Service: Unauthorized
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
E0323 00:58:51.873269 1 reflector.go:178] pkg/mod/k8s.io/client-go#v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Endpoints: Unauthorized
[INFO] plugin/ready: Still waiting on: "kubernetes"
E0323 00:59:00.966217 1 reflector.go:178] pkg/mod/k8s.io/client-go#v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Namespace: Unauthorized
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
E0323 00:59:23.151006 1 reflector.go:178] pkg/mod/k8s.io/client-go#v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Service: Unauthorized
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
E0323 00:59:47.362409 1 reflector.go:178] pkg/mod/k8s.io/client-go#v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Endpoints: Unauthorized
E0323 00:59:48.563791 1 reflector.go:178] pkg/mod/k8s.io/client-go#v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Namespace: Unauthorized
E0323 00:59:56.278764 1 reflector.go:178] pkg/mod/k8s.io/client-go#v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Service: Unauthorized
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
E0323 00:58:07.504557 1 reflector.go:178] pkg/mod/k8s.io/client-go#v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Namespace: Unauthorized
[INFO] plugin/ready: Still waiting on: "kubernetes"
E0323 00:58:24.948534 1 reflector.go:178] pkg/mod/k8s.io/client-go#v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Service: Unauthorized
[INFO] plugin/ready: Still waiting on: "kubernetes"
E0323 00:58:33.605013 1 reflector.go:178] pkg/mod/k8s.io/client-go#v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Endpoints: Unauthorized
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
E0323 00:58:56.471477 1 reflector.go:178] pkg/mod/k8s.io/client-go#v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Namespace: Unauthorized
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
E0323 00:59:20.436808 1 reflector.go:178] pkg/mod/k8s.io/client-go#v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Service: Unauthorized
E0323 00:59:21.200346 1 reflector.go:178] pkg/mod/k8s.io/client-go#v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Endpoints: Unauthorized
[INFO] plugin/ready: Still waiting on: "kubernetes"
E0323 00:59:29.597663 1 reflector.go:178] pkg/mod/k8s.io/client-go#v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Namespace: Unauthorized
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
When I try to find some help on net, I found out, it use coredns service user. I check for all that roles and bindings.
SERVICE ACCOUNT
$ kubectl get sa coredns -n kube-system -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
creationTimestamp: "2021-03-03T15:17:38Z"
name: coredns
namespace: kube-system
resourceVersion: "297"
uid: 13633498-2e6b-4ac4-bb34-f2d5c9e4d262
secrets:
- name: coredns-token-sg7p9
TOKEN SECRET
$ kubectl get secret coredns-token-sg7p9 -n kube-system
NAME TYPE DATA AGE
coredns-token-sg7p9 kubernetes.io/service-account-token 3 19d
CLUSTER ROLE
$ kubectl get clusterrole system:coredns -n kube-system -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
creationTimestamp: "2021-03-03T15:17:38Z"
managedFields:
- apiVersion: rbac.authorization.k8s.io/v1
fieldsType: FieldsV1
fieldsV1:
f:rules: {}
manager: kubeadm
operation: Update
time: "2021-03-03T15:17:38Z"
name: system:coredns
resourceVersion: "292"
uid: 35adc9a3-7415-4498-81b2-a4eab50882b1
rules:
- apiGroups:
- ""
resources:
- endpoints
- services
- pods
- namespaces
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
CLUSTER ROLE BINDINGS
$ kubectl get clusterrolebindings system:coredns -n kube-system -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
creationTimestamp: "2021-03-03T15:17:38Z"
managedFields:
- apiVersion: rbac.authorization.k8s.io/v1
fieldsType: FieldsV1
fieldsV1:
f:roleRef:
f:apiGroup: {}
f:kind: {}
f:name: {}
f:subjects: {}
manager: kubeadm
operation: Update
time: "2021-03-03T15:17:38Z"
name: system:coredns
resourceVersion: "293"
uid: 2d47c2cb-6641-4a62-b867-8a598ac3923a
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:coredns
subjects:
- kind: ServiceAccount
name: coredns
namespace: kube-system
From unauthorized error, I can predict it might be related to token, like token expired and not renewed. I was trying to find help on net, for how to renew token for Coredns, but didn't find any help.
I might be doing something wrong, but can't find that.
There is help available if Pod is not in Running state, but not for unauthorized after running pod.
Related
I am new to kubernetes and was trying to apply horizontal pod autoscaling to my existing application. and after following other stackoverflow details - got to know that I need to install metric-server - and I was able to - but some how it's not working and unable to handle request.
Further I followed few more things but unable to resolve the issue - I will really appreciate any help here.
Please let me know for any further details you need for helping me :) Thanks in advance.
Steps followed:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
kubectl get deploy,svc -n kube-system | egrep metrics-server
deployment.apps/metrics-server 1/1 1 1 2m6s
service/metrics-server ClusterIP 10.32.0.32 <none> 443/TCP 2m6s
kubectl get pods -n kube-system | grep metrics-server
metrics-server-64cf6869bd-6gx88 1/1 Running 0 2m39s
vi ana_hpa.yaml
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: ana-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: StatefulSet
name: common-services-auth
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 160
k apply -f ana_hpa.yaml
horizontalpodautoscaler.autoscaling/ana-hpa created
k get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
ana-hpa StatefulSet/common-services-auth <unknown>/160%, <unknown>/80% 1 10 0 4s
k describe hpa ana-hpa
Name: ana-hpa
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Tue, 12 Apr 2022 17:01:25 +0530
Reference: StatefulSet/common-services-auth
Metrics: ( current / target )
resource memory on pods (as a percentage of request): <unknown> / 160%
resource cpu on pods (as a percentage of request): <unknown> / 80%
Min replicas: 1
Max replicas: 10
StatefulSet pods: 3 current / 0 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale
ScalingActive False FailedGetResourceMetric the HPA was unable to compute the replica count: failed to get memory utilization: unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedGetResourceMetric 38s (x8 over 2m23s) horizontal-pod-autoscaler failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
Warning FailedComputeMetricsReplicas 38s (x8 over 2m23s) horizontal-pod-autoscaler invalid metrics (2 invalid out of 2), first error is: failed to get memory utilization: unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
Warning FailedGetResourceMetric 23s (x9 over 2m23s) horizontal-pod-autoscaler failed to get memory utilization: unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
kubectl get --raw /apis/metrics.k8s.io/v1beta1
Error from server (ServiceUnavailable): the server is currently unable to handle the request
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes"
Error from server (ServiceUnavailable): the server is currently unable to handle the request
kubectl edit deployments.apps -n kube-system metrics-server
Add hostNetwork: true
deployment.apps/metrics-server edited
kubectl get pods -n kube-system | grep metrics-server
metrics-server-5dc6dbdb8-42hw9 1/1 Running 0 10m
k describe pod metrics-server-5dc6dbdb8-42hw9 -n kube-system
Name: metrics-server-5dc6dbdb8-42hw9
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: pusntyn196.apac.avaya.com/10.133.85.196
Start Time: Tue, 12 Apr 2022 17:08:25 +0530
Labels: k8s-app=metrics-server
pod-template-hash=5dc6dbdb8
Annotations: <none>
Status: Running
IP: 10.133.85.196
IPs:
IP: 10.133.85.196
Controlled By: ReplicaSet/metrics-server-5dc6dbdb8
Containers:
metrics-server:
Container ID: containerd://024afb1998dce4c0bd5f4e58f996068ea37982bd501b54fda2ef8d5c1098b4f4
Image: k8s.gcr.io/metrics-server/metrics-server:v0.6.1
Image ID: k8s.gcr.io/metrics-server/metrics-server#sha256:5ddc6458eb95f5c70bd13fdab90cbd7d6ad1066e5b528ad1dcb28b76c5fb2f00
Port: 4443/TCP
Host Port: 4443/TCP
Args:
--cert-dir=/tmp
--secure-port=4443
--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
--kubelet-use-node-status-port
--metric-resolution=15s
State: Running
Started: Tue, 12 Apr 2022 17:08:26 +0530
Ready: True
Restart Count: 0
Requests:
cpu: 100m
memory: 200Mi
Liveness: http-get https://:https/livez delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get https://:https/readyz delay=20s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/tmp from tmp-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-g6p4g (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
tmp-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-g6p4g:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 2s
node.kubernetes.io/unreachable:NoExecute op=Exists for 2s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m31s default-scheduler Successfully assigned kube-system/metrics-server-5dc6dbdb8-42hw9 to pusntyn196.apac.avaya.com
Normal Pulled 2m32s kubelet Container image "k8s.gcr.io/metrics-server/metrics-server:v0.6.1" already present on machine
Normal Created 2m31s kubelet Created container metrics-server
Normal Started 2m31s kubelet Started container metrics-server
kubectl get --raw /apis/metrics.k8s.io/v1beta1
Error from server (ServiceUnavailable): the server is currently unable to handle the request
kubectl get pods -n kube-system | grep metrics-server
metrics-server-5dc6dbdb8-42hw9 1/1 Running 0 10m
kubectl logs -f metrics-server-5dc6dbdb8-42hw9 -n kube-system
E0412 11:43:54.684784 1 configmap_cafile_content.go:242] kube-system/extension-apiserver-authentication failed with : missing content for CA bundle "client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
E0412 11:44:27.001010 1 configmap_cafile_content.go:242] key failed with : missing content for CA bundle "client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
k logs -f metrics-server-5dc6dbdb8-42hw9 -n kube-system
I0412 11:38:26.447305 1 serving.go:342] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I0412 11:38:26.899459 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0412 11:38:26.899477 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I0412 11:38:26.899518 1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0412 11:38:26.899545 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0412 11:38:26.899546 1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0412 11:38:26.899567 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0412 11:38:26.900480 1 dynamic_serving_content.go:131] "Starting controller" name="serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key"
I0412 11:38:26.900811 1 secure_serving.go:266] Serving securely on [::]:4443
I0412 11:38:26.900854 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
W0412 11:38:26.900965 1 shared_informer.go:372] The sharedIndexInformer has started, run more than once is not allowed
I0412 11:38:26.999960 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0412 11:38:26.999989 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController
I0412 11:38:26.999970 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
E0412 11:38:27.000087 1 configmap_cafile_content.go:242] kube-system/extension-apiserver-authentication failed with : missing content for CA bundle "client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
E0412 11:38:27.000118 1 configmap_cafile_content.go:242] key failed with : missing content for CA bundle "client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
kubectl top nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
kubectl top pods
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods.metrics.k8s.io)
Edit metrics server deployment yaml
Add - --kubelet-insecure-tls
k apply -f metric-server-deployment.yaml
serviceaccount/metrics-server unchanged
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader unchanged
clusterrole.rbac.authorization.k8s.io/system:metrics-server unchanged
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader unchanged
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator unchanged
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server unchanged
service/metrics-server unchanged
deployment.apps/metrics-server configured
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io unchanged
kubectl get pods -n kube-system | grep metrics-server
metrics-server-5dc6dbdb8-42hw9 1/1 Running 0 10m
kubectl top pods
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods.metrics.k8s.io)
Also tried by adding below to metrics server deployment
command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
This can easily be resolved by editing the deployment yaml files and adding the hostNetwork: true after the dnsPolicy: ClusterFirst
kubectl edit deployments.apps -n kube-system metrics-server
insert:
hostNetwork: true
I hope this help somebody for bare metal cluster:
$ helm --repo https://kubernetes-sigs.github.io/metrics-server/ --kubeconfig=$HOME/.kube/loc-cluster.config -n kube-system --set args='{--kubelet-insecure-tls}' upgrade --install metrics-server metrics-server
$ helm --kubeconfig=$HOME/.kube/loc-cluster.config -n kube-system uninstall metrics-server
Update: I deployed the metrics-server using the same command. Perhaps you can start fresh by removing existing resources and running:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
=======================================================================
It appears the --kubelet-insecure-tls flag was not configured correctly for the pod template in the deployment. The following should fix this:
Edit the existing deployment in the cluster with kubectl edit deployment/metrics-server -nkube-system.
Add the flag to the spec.containers[].args list, so that the deployment looks like this:
...
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls <=======ADD IT HERE.
image: k8s.gcr.io/metrics-server/metrics-server:v0.6.1
...
Simply save your changes and let the deployment rollout the updated pods. You can use watch -n1 kubectl get deployment/kube-metrics -nkube-system and wait for UP-TO-DATE column to show 1.
Like this:
NAME READY UP-TO-DATE AVAILABLE AGE
metrics-server 1/1 1 1 16m
Verify with kubectl top nodes. It will show something like
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
docker-desktop 222m 5% 1600Mi 41%
I've just verified this to work on a local setup. Let me know if this helps :)
Please configuration aggregation layer correctly and carefully, you can use this link for help : https://kubernetes.io/docs/tasks/extend-kubernetes/configure-aggregation-layer/.
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
name: <name of the registration object>
spec:
group: <API group name this extension apiserver hosts>
version: <API version this extension apiserver hosts>
groupPriorityMinimum: <priority this APIService for this group, see API documentation>
versionPriority: <prioritizes ordering of this version within a group, see API documentation>
service:
namespace: <namespace of the extension apiserver service>
name: <name of the extension apiserver service>
caBundle: <pem encoded ca cert that signs the server cert used by the webhook>
It would be helpful to provide kubectl version return value.
For me on EKS with helmfile I had to write in the values.yaml using the metrics-server chart :
containerPort: 10250
The value was enforced by default to 4443 for an unknown reason when I first deployed the chart.
See doc:
https://github.com/kubernetes-sigs/metrics-server/blob/master/charts/metrics-server/values.yaml#L62
https://aws.amazon.com/premiumsupport/knowledge-center/eks-metrics-server/#:~:text=confirm%20that%20your%20security%20groups
Then kubectl top nodes and kubectl describe apiservice v1beta1.metrics.k8s.io were working.
First of all, execute the following command:
kubectl get apiservices
And checkout the availablity (status) of kube-system/metrics-server service.
In case the availability is True:
Add hostNetwork: true to the spec of your metrics-server deployment by executing the following command:
kubectl edit deployment -n kube-system metrics-server
It should look like the following:
...
spec:
hostNetwork: true
...
Setting hostNetwork to true means that Pod will have access to
the host where it's running.
In case the availability is False (MissingEndpoints):
Download metrics-server:
wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.5.0/components.yaml
Remove (legacy) metrics server:
kubectl delete -f components.yaml
Edit downloaded file and add - --kubelet-insecure-tls to args list:
...
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls # add this line
...
Create service once again:
kubectl apply -f components.yaml
I am trying to install rancher on AKS using helm3, following below documentation :
https://rancher.com/docs/rancher/v2.5/en/installation/install-rancher-on-k8s/
helm upgrade --install rancher rancher-stable/rancher -f rancher.yaml -n cattle-system --set ingress.tls.source=rancher --set proxy="export https_proxy=http://x.x.x.x/" --set proxy="export http_proxy=http://x.x.x.x/"
[rancher]$ kubectl get pod -n cattle-system
NAME READY STATUS RESTARTS AGE
hrancher-854b498848-gw4cb 1/1 Running 0 15m
rancher-854b498848-nnbqs 1/1 Running 0 15m
rancher-854b498848-wbcvs 1/1 Running 0 15m
helm-operation-jkjzb 0/2 Completed 0 29m
rancher-webhook-6979fbd4bf-qzkgl 1/1 Running 0 5d19h
Helm created configmap, certificate, Service account, Service ,CRD
---
# Source: rancher/templates/clusterRoleBinding.yaml
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: rancher
namespace: cattle-system
labels:
app: rancher
app.kubernetes.io/managed-by: Helm
chart: rancher-2.6.2
heritage: Helm
release: rancher
subjects:
- kind: ServiceAccount
name: rancher
namespace: cattle-system
roleRef:
kind: ClusterRole
name: cluster-admin
apiGroup: rbac.authorization.k8s.io
Please find the pod logs:
POD 1
=====
[rancher]$ kubectl logs -f rancher-854b498848-nnbqs -n cattle-system | grep ERROR
2021/12/14 09:35:46 [ERROR] error syncing 'cattle-system/serving-cert': handler tls-storage: Secret "serving-cert" is invalid: [data[tls.crt]: Required value, data[tls.key]: Required value], requeuing
2021/12/14 09:35:48 [ERROR] Failed to connect to peer wss://10.241.1.180/v3/connect [local ID=10.241.0.234]: websocket: bad handshake
2021/12/14 09:35:50 [ERROR] Failed to handling tunnel request from 10.241.0.235:53604: response 400: cluster not found
POD 2
=====
[rancher]$ kubectl logs -f rancher-854b498848-wbcvs -n cattle-system | grep ERROR
2021/12/14 09:35:48 [ERROR] Failed to connect to peer wss://10.241.1.180/v3/connect [local ID=10.241.0.235]: websocket: bad handshake
POD 3
==========
[rancher]$ kubectl logs -f rancher-854b498848-gw4cb -n cattle-system | grep ERROR | head -10
2021/12/14 09:35:36 [ERROR] error syncing 'cattle-system/serving-cert': handler tls-storage: Secret "serving-cert" is invalid: [data[tls.crt]: Required value, data[tls.key]: Required value], requeuing
2021/12/14 09:35:48 [ERROR] Failed to handling tunnel request from 10.241.0.235:55192: response 400: cluster not found
2021/12/14 09:36:17 [ERROR] error parsing azure-group-cache-size, skipping update strconv.Atoi: parsing "": invalid syntax
2021/12/14 09:36:17 [ERROR] error syncing 'cluster-admin': handler auth-prov-v2-roletemplate: clusterroles.rbac.authorization.k8s.io "cluster-admin" not found, requeuing
My goal is to monitor application running in a Kubernetes cluster from a remote Prometheus server. Based on: https://medium.com/#amjadhussain3751/monitor-remote-kubernetes-cluster-using-prometheus-a3781b041745
I did:
Create a service account which has permissions to read and watch the pods.
Generate token from the service account.
I put the token in the prometheus.yaml as describe in the post:
-------
- job_name: kubernetes-service-endpoints
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- api_server: http://kk-kk-0-73.mycompany.com:6443
role: endpoints
tls_config:
insecure_skip_verify: true
bearer_token: "ZXlKaGJHY2lP ......NJbk4xWWlJNkluTjVjM1JsYlRwelpYSjJhV05sWVdOamIzVnVkRHB0YjI1cGRHOXlhVzVuT25CeWIyMHRiVzl1TFdGalkzUWlmUS5WY2tCNlkxUVE1cmRUZGRLVGVnVzBRMVhfR0t1dFBKYWpybkNybnZLQkgtTjFlWjV0bklvcmJONFBXdTA3TWFNMXp2Z3pJd3JmS2h1RG02M0hyVkJocjA0QW5xY3FLXzBfTDE5cXc5TEUwX0pINXgycmhHeWtHc0Jmd0xTVWVSWTZCY1JES2d2TGJ5QmhIVG1qU0QxcjRHUXI0TVlvSmtldUk5bTNpdG9hX2ZqLVpaNjVCNll2eUVFOUxLQ01RWVpNV1FZczgteE1hWUJYejUwQXdnSElPS3E3NEpuNkdFODB4cjRMZUpYNXMwbDUzTmpIY3FPMVRGMDBfbDM1VnZpSlZqR3VXNmNFRXhYaDhNZ3RNV3M3SlQ0a0pvNkFDNWhCLW1VQjJtNHJlLXMxcU1qNVc5X2FRZ2dGc3gtNGlKU285bG0zRi15SS1uNFg3YVFnbGNWQ3c="
namespaces:
names: []
bearer_token: "ZXlKaGJHY2l ..... 0T1RjMk9TSXNJbk4xWWlJNkluTjVjM1JsYlRwelpYSjJhV05sWVdOamIzVnVkRHB0YjI1cGRHOXlhVzVuT25CeWIyMHRiVzl1TFdGalkzUWlmUS5WY2tCNlkxUVE1cmRUZGRLVGVnVzBRMVhfR0t1dFBKYWpybkNybnZLQkgtTjFlWjV0bklvcmJONFBXdTA3TWFNMXp2Z3pJd3JmS2h1RG02M0hyVkJocjA0QW5xY3FLXzBfTDE5cXc5TEUwX0pINXgycmhHeWtHc0Jmd0xTVWVSWTZCY1JES2d2TGJ5QmhIVG1qU0QxcjRHUXI0TVlvSmtldUk5bTNpdG9hX2ZqLVpaNjVCNll2eUVFOUxLQ01RWVpNV1FZczgteE1hWUJYejUwQXdnSElPS3E3NEpuNkdFODB4cjRMZUpYNXMwbDUzTmpIY3FPMVRGMDBfbDM1VnZpSlZqR3VXNmNFRXhYaDhNZ3RNV3M3SlQ0a0pvNkFDNWhCLW1VQjJtNHJlLXMxcU1qNVc5X2FRZ2dGc3gtNGlKU285bG0zRi15SS1uNFg3YVFnbGNWQ3c="
relabel_configs:
----
When I start my prometheus, I got errors as the followings
---
level=error ts=2021-04-29T20:51:27.255Z caller=klog.go:104 component=k8s_client_runtime func=Errorf msg="Unexpected error when reading response body: read tcp pm.pm.pm.15:51652->kk.kk.0.73:6443: read: connection reset by peer"
level=error ts=2021-04-29T20:51:27.256Z caller=klog.go:96 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/client-go#v0.20.2/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: unexpected error when reading response body. Please retry. Original error: read tcp pm.pm.pm.15:51652->kk.kk.0.73:6443: read: connection reset by peer"
level=debug ts=2021-04-29T20:51:27.445Z caller=klog.go:55 component=k8s_client_runtime func=Verbose.Infof msg="Listing and watching *v1.Endpoints from pkg/mod/k8s.io/client-go#v0.20.2/tools/cache/reflector.go:167"
level=debug ts=2021-04-29T20:51:27.495Z caller=klog.go:72 component=k8s_client_runtime func=Infof msg="GET http://kk.kk.0-73.mycommany.com:6443/api/v1/endpoints?limit=500&resourceVersion=0 400 Bad Request in 49 milliseconds"
level=error ts=2021-04-29T20:51:27.495Z caller=klog.go:104 component=k8s_client_runtime func=Errorf msg="Unexpected error when reading response body: read tcp pm.pm.pm.15:51654->kk.kk.0.73:6443: read: connection reset by peer"
level=error ts=2021-04-29T20:51:27.495Z caller=klog.go:96 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/client-go#v0.20.2/tools/cache/reflector.go:167: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: unexpected error when reading response body. Please retry. Original error: read tcp pm.pm.pm.15:51654->kk.kk.0.73:6443: read: connection reset by peer"
---
Note: the following represents the ip address of my prometheus server and k8s master node ip
prometheus server ip: pm.pm.pm.15
K8s cluster master node ip: kk.kk.0.73
btw telnet from pm.pm.pm.15 to kk.kk.0.73:6443 is good.
Did I set the parameters correctly? How to troubleshoot? Any suggestions appreciated.
-dsun
some contents of my object config yaml
--- service_account.yaml ---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prom-mon-acct
namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus-mon-acct-rb
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prom-mon-acct
namespace: monitoring
----cluster_role.yaml ---
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/proxy
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups:
- extensions
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
[update 1]
last few line from: kubectl logs kube-apiserver-kk-kk-0-73.mycomoany.com -n kube-system
....
I0429 22:30:36.256704 1 clientconn.go:948] ClientConn switching balancer to "pick_first"
I0429 22:31:18.638120 1 client.go:360] parsed scheme: "passthrough"
I0429 22:31:18.638164 1 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{https://127.0.0.1:2379 <nil> 0 <nil>}] <nil> <nil>}
I0429 22:31:18.638175 1 clientconn.go:948] ClientConn switching balancer to "pick_first"
I0429 22:31:56.333614 1 client.go:360] parsed scheme: "passthrough"
I0429 22:31:56.333660 1 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{https://127.0.0.1:2379 <nil> 0 <nil>}] <nil> <nil>}
I0429 22:31:56.333671 1 clientconn.go:948] ClientConn switching balancer to "pick_first"
I0429 22:32:38.907517 1 client.go:360] parsed scheme: "passthrough"
I0429 22:32:38.907560 1 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{https://127.0.0.1:2379 <nil> 0 <nil>}] <nil> <nil>}
I0429 22:32:38.907570 1 clientconn.go:948] ClientConn switching balancer to "pick_first"
I0429 22:33:14.696739 1 client.go:360] parsed scheme: "passthrough"
I0429 22:33:14.696781 1 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{https://127.0.0.1:2379 <nil> 0 <nil>}] <nil> <nil>}
I0429 22:33:14.696792 1 clientconn.go:948] ClientConn switching balancer to "pick_first"
[update 2]
test with: # curl -X GET $YOUR_API_SERVER/api --header "Authorization: Bearer $YOUR_TOKEN" and using https instead of http
NAMESPACE NAME TYPE DATA AGE
default default-token-j2bvm kubernetes.io/service-account-token 3 4d14h ---> token from this account is working with curl api server, but don't have list pod privs
monitoring prom-mon-acct-token-4fchr kubernetes.io/service-account-token 3 19h ---> token from this service account not working with curl api server
I know prom-mon-acct service account has list pod privs, need to figure out how to make it able to talk to API.
I'm new in kubernetes world, so forgive me if i'm writing mistake. I'm trying to deploy kubernetes dashboard
My cluster is containing three masters and 3 workers drained and not schedulable in order to install dashboard to masters nodes :
[root#pp-tmp-test20 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
pp-tmp-test20 Ready master 2d2h v1.15.2
pp-tmp-test21 Ready master 37h v1.15.2
pp-tmp-test22 Ready master 37h v1.15.2
pp-tmp-test23 Ready,SchedulingDisabled worker 36h v1.15.2
pp-tmp-test24 Ready,SchedulingDisabled worker 36h v1.15.2
pp-tmp-test25 Ready,SchedulingDisabled worker 36h v1.15.2
I'm trying to deploy kubernetes dashboard via this url :
[root#pp-tmp-test20 ~]# kubectl create -f https://raw.githubusercontent.com/kubernetes/dashboard/v1.10.1/src/deploy/recommended/kubernetes-dashboard.yaml
After this, a pod kubernetes-dashboard-5698d5bc9-ql6q8 is scheduled on my master node pp-tmp-test20/172.31.68.220
the pod
kube-system kubernetes-dashboard-5698d5bc9-ql6q8 /1 Running 1 7m11s 10.244.0.7 pp-tmp-test20 <none> <none>
the pod's logs
[root#pp-tmp-test20 ~]# kubectl logs kubernetes-dashboard-5698d5bc9-ql6q8 -n kube-system
2019/08/14 10:14:57 Starting overwatch
2019/08/14 10:14:57 Using in-cluster config to connect to apiserver
2019/08/14 10:14:57 Using service account token for csrf signing
2019/08/14 10:14:58 Successful initial request to the apiserver, version: v1.15.2
2019/08/14 10:14:58 Generating JWE encryption key
2019/08/14 10:14:58 New synchronizer has been registered: kubernetes-dashboard-key-holder-kube-system. Starting
2019/08/14 10:14:58 Starting secret synchronizer for kubernetes-dashboard-key-holder in namespace kube-system
2019/08/14 10:14:59 Initializing JWE encryption key from synchronized object
2019/08/14 10:14:59 Creating in-cluster Heapster client
2019/08/14 10:14:59 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
2019/08/14 10:14:59 Auto-generating certificates
2019/08/14 10:14:59 Successfully created certificates
2019/08/14 10:14:59 Serving securely on HTTPS port: 8443
2019/08/14 10:15:29 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
2019/08/14 10:15:59 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
the describe of the pod
[root#pp-tmp-test20 ~]# kubectl describe pob kubernetes-dashboard-5698d5bc9-ql6q8 -n kube-system
Name: kubernetes-dashboard-5698d5bc9-ql6q8
Namespace: kube-system
Priority: 0
Node: pp-tmp-test20/172.31.68.220
Start Time: Wed, 14 Aug 2019 16:58:39 +0200
Labels: k8s-app=kubernetes-dashboard
pod-template-hash=5698d5bc9
Annotations: <none>
Status: Running
IP: 10.244.0.7
Controlled By: ReplicaSet/kubernetes-dashboard-5698d5bc9
Containers:
kubernetes-dashboard:
Container ID: docker://40edddf7a9102d15e3b22f4bc6f08b3a07a19e4841f09360daefbce0486baf0e
Image: k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.1
Image ID: docker-pullable://k8s.gcr.io/kubernetes-dashboard-amd64#sha256:0ae6b69432e78069c5ce2bcde0fe409c5c4d6f0f4d9cd50a17974fea38898747
Port: 8443/TCP
Host Port: 0/TCP
Args:
--auto-generate-certificates
State: Running
Started: Wed, 14 Aug 2019 16:58:43 +0200
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 14 Aug 2019 16:58:41 +0200
Finished: Wed, 14 Aug 2019 16:58:42 +0200
Ready: True
Restart Count: 1
Liveness: http-get https://:8443/ delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/certs from kubernetes-dashboard-certs (rw)
/tmp from tmp-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kubernetes-dashboard-token-ptw78 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kubernetes-dashboard-certs:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-certs
Optional: false
tmp-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kubernetes-dashboard-token-ptw78:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-token-ptw78
Optional: false
QoS Class: BestEffort
Node-Selectors: dashboard=true
Tolerations: node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m41s default-scheduler Successfully assigned kube-system/kubernetes-dashboard-5698d5bc9-ql6q8 to pp-tmp-test20.tec.prj.in.phm.education.gouv.fr
Normal Pulled 2m38s (x2 over 2m40s) kubelet, pp-tmp-test20 Container image "k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.1" already present on machine
Normal Created 2m37s (x2 over 2m39s) kubelet, pp-tmp-test20 Created container kubernetes-dashboard
Normal Started 2m37s (x2 over 2m39s) kubelet, pp-tmp-test20 Started container kubernetes-dashboard
the describe of the dashboard service
[root#pp-tmp-test20 ~]# kubectl describe svc/kubernetes-dashboard -n kube-system
Name: kubernetes-dashboard
Namespace: kube-system
Labels: k8s-app=kubernetes-dashboard
Annotations: <none>
Selector: k8s-app=kubernetes-dashboard
Type: ClusterIP
IP: 10.110.236.88
Port: <unset> 443/TCP
TargetPort: 8443/TCP
Endpoints: 10.244.0.7:8443
Session Affinity: None
Events: <none>
the docker ps on my master running the pod
[root#pp-tmp-test20 ~]# Docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
40edddf7a910 f9aed6605b81 "/dashboard --inse..." 7 minutes ago Up 7 minutes k8s_kubernetes-dashboard_kubernetes-dashboard-5698d5bc9-ql6q8_kube-system_f785d4bd-2e67-4daa-9f6c-19f98582fccb_1
e7f3820f1cf2 k8s.gcr.io/pause:3.1 "/pause" 7 minutes ago Up 7 minutes k8s_POD_kubernetes-dashboard-5698d5bc9-ql6q8_kube-system_f785d4bd-2e67-4daa-9f6c-19f98582fccb_0
[root#pp-tmp-test20 ~]# docker logs 40edddf7a910
2019/08/14 14:58:43 Starting overwatch
2019/08/14 14:58:43 Using in-cluster config to connect to apiserver
2019/08/14 14:58:43 Using service account token for csrf signing
2019/08/14 14:58:44 Successful initial request to the apiserver, version: v1.15.2
2019/08/14 14:58:44 Generating JWE encryption key
2019/08/14 14:58:44 New synchronizer has been registered: kubernetes-dashboard-key-holder-kube-system. Starting
2019/08/14 14:58:44 Starting secret synchronizer for kubernetes-dashboard-key-holder in namespace kube-system
2019/08/14 14:58:44 Initializing JWE encryption key from synchronized object
2019/08/14 14:58:44 Creating in-cluster Heapster client
2019/08/14 14:58:44 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
2019/08/14 14:58:44 Auto-generating certificates
2019/08/14 14:58:44 Successfully created certificates
2019/08/14 14:58:44 Serving securely on HTTPS port: 8443
2019/08/14 14:59:14 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
2019/08/14 14:59:44 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
2019/08/14 15:00:14 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
1/ On my master I start the proxy
[root#pp-tmp-test20 ~]# kubectl proxy
Starting to serve on 127.0.0.1:8001
2/ I launch firefox with x11 redirect from my master and hit this url
http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#!/login
this is the error message I get in the browser
Error: 'dial tcp 10.244.0.7:8443: connect: no route to host'
Trying to reach: 'https://10.244.0.7:8443/'
In the same time i got these errors from the console where I launched the proxy
I0814 16:10:05.836114 20240 log.go:172] http: proxy error: context canceled
I0814 16:10:06.198701 20240 log.go:172] http: proxy error: context canceled
I0814 16:13:21.708190 20240 log.go:172] http: proxy error: unexpected EOF
I0814 16:13:21.708229 20240 log.go:172] http: proxy error: unexpected EOF
I0814 16:13:21.708270 20240 log.go:172] http: proxy error: unexpected EOF
I0814 16:13:39.335483 20240 log.go:172] http: proxy error: context canceled
I0814 16:13:39.716360 20240 log.go:172] http: proxy error: context canceled
but after refresh n times (randomly) the browser I'm able to reach the login interface to enter the token (created before)
Dashboard_login
But... the same error occur again
Dashboard_login_error
After hit n times the 'sign in' button I'm able to get the dashboard.. for few seconds.
dashboard_interface_1
dashboard_interface_2
after that the dashboard start to produce the same errors when I'm am exploring the interface:
dashboard_interface_error_1
dashboard_interface_error_2
I looked the pod logs, we can see some trafic :
[root#pp-tmp-test20 ~]# kubectl logs kubernetes-dashboard-5698d5bc9-ql6q8 -n kube-system
2019/08/14 14:16:56 Getting list of all services in the cluster
2019/08/14 14:16:56 [2019-08-14T14:16:56Z] Outcoming response to 10.244.0.1:56140 with 200 status code
2019/08/14 14:17:01 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
2019/08/14 14:17:22 [2019-08-14T14:17:22Z] Incoming HTTP/2.0 GET /api/v1/login/status request from 10.244.0.1:56140: {}
2019/08/14 14:17:22 [2019-08-14T14:17:22Z] Outcoming response to 10.244.0.1:56140 with 200 status code
2019/08/14 14:17:22 [2019-08-14T14:17:22Z] Incoming HTTP/2.0 GET /api/v1/csrftoken/token request from 10.244.0.1:56140: {}
2019/08/14 14:17:22 [2019-08-14T14:17:22Z] Outcoming response to 10.244.0.1:56140 with 200 status code
2019/08/14 14:17:22 [2019-08-14T14:17:22Z] Incoming HTTP/2.0 POST /api/v1/token/refresh request from 10.244.0.1:56140: { contents hidden }
2019/08/14 14:17:22 [2019-08-14T14:17:22Z] Outcoming response to 10.244.0.1:56140 with 200 status code
2019/08/14 14:17:22 [2019-08-14T14:17:22Z] Incoming HTTP/2.0 GET /api/v1/settings/global/cani request from 10.244.0.1:56140: {}
2019/08/14 14:17:22 [2019-08-14T14:17:22Z] Outcoming response to 10.244.0.1:56140 with 200 status code
2019/08/14 14:17:22 [2019-08-14T14:17:22Z] Incoming HTTP/2.0 GET /api/v1/settings/global request from 10.244.0.1:56140: {}
2019/08/14 14:17:22 Cannot find settings config map: configmaps "kubernetes-dashboard-settings" not found
and again the pod logs
[root#pp-tmp-test20 ~]# kubectl logs kubernetes-dashboard-5698d5bc9-ql6q8 -n kube-system
Error from server: Get https://172.31.68.220:10250/containerLogs/kube-system/kubernetes-dashboard-5698d5bc9-ql6q8/kubernetes-dashboard: Forbidden
What I'm doing wrong ? Could you please tell me some investigating way ?
EDIT :
my service account that I used
# cat dashboard-adminuser.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-user
namespace: kube-system
# cat dashboard-adminuser-ClusterRoleBinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: admin-user
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: admin-user
namespace: kube-system
It seams heapster is deprecated with kubernetes in favor of metrics-server: Support metrics API #2986 & Heapster Deprecation Timeline .
I have deployed a dashboard that use heapster. This dashboard version is not compatible with my kubernetes version (1.15). So possible way to resolve the issue: install dashboard v2.0.0-beta3
# kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0-beta3/aio/deploy/recommended.yaml
It seems that the serviceaccount kubernetes-dashboard doesn't have access to all kubernetes resources because it was bound to kubernetes-dashboard-minimal role. If you bind the service account to cluster-admin role , you won't get such issues. Below YAML file can be used to achieve this.
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: kubernetes-dashboard
labels:
k8s-app: kubernetes-dashboard
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: kubernetes-dashboard
namespace: kube-system
I'm new to kubernetes, and I'm trying to create a cluster. But after I configure the master with the kubeadm command, I see there are some errors with the pods, and this results in a master that is always in a NotReady state.
All seems to originate from the fact that kube-proxy cannot list the endpoints and the services... and for this reason (or so I understand) cannot update the iptables.
Here is my kubectl version:
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:11:31Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:02:58Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
And here are the logs from the kube-proxy pod:
$ kubectl logs -n kube-system kube-proxy-xjxck
W0430 12:33:28.887260 1 server_others.go:267] Flag proxy-mode="" unknown, assuming iptables proxy
W0430 12:33:28.913671 1 node.go:113] Failed to retrieve node info: Unauthorized
I0430 12:33:28.915780 1 server_others.go:147] Using iptables Proxier.
W0430 12:33:28.916065 1 proxier.go:314] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP
W0430 12:33:28.916089 1 proxier.go:319] clusterCIDR not specified, unable to distinguish between internal and external traffic
I0430 12:33:28.917555 1 server.go:555] Version: v1.14.1
I0430 12:33:28.959345 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0430 12:33:28.960392 1 config.go:202] Starting service config controller
I0430 12:33:28.960444 1 controller_utils.go:1027] Waiting for caches to sync for service config controller
I0430 12:33:28.960572 1 config.go:102] Starting endpoints config controller
I0430 12:33:28.960609 1 controller_utils.go:1027] Waiting for caches to sync for endpoints config controller
E0430 12:33:28.970720 1 event.go:191] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"fh-ubuntu01.159a40901fa85264", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"fh-ubuntu01", UID:"fh-ubuntu01", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kube-proxy.", Source:v1.EventSource{Component:"kube-proxy", Host:"fh-ubuntu01"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xbf2a2e0639406264, ext:334442672, loc:(*time.Location)(0x2703080)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xbf2a2e0639406264, ext:334442672, loc:(*time.Location)(0x2703080)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Unauthorized' (will not retry!)
E0430 12:33:28.970939 1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Endpoints: Unauthorized
E0430 12:33:28.971106 1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Service: Unauthorized
E0430 12:33:29.977038 1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Endpoints: Unauthorized
E0430 12:33:29.979890 1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Service: Unauthorized
E0430 12:33:30.980098 1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Endpoints: Unauthorized
now, I've created a new ClusterRoleBinding this way:
$ kubectl create clusterrolebinding kube-proxy-binding --clusterrole=system:node-proxier --user=system:kube-proxy
and if I describe the ClusterRole, I can see this:
$ kubectl describe clusterrole system:node-proxier
Name: system:node-proxier
Labels: kubernetes.io/bootstrapping=rbac-defaults
Annotations: rbac.authorization.kubernetes.io/autoupdate: true
PolicyRule:
Resources Non-Resource URLs Resource Names Verbs
--------- ----------------- -------------- -----
events [] [] [create patch update]
nodes [] [] [get]
endpoints [] [] [list watch]
services [] [] [list watch]
so the user "system:kube-proxy" should be able to list the endpoints and the services, right? Now, if I print the YAML file of the kube-proxy daemonSet, I get his:
$ kubectl get configmap kube-proxy -n kube-system -o yaml
apiVersion: v1
data:
config.conf: |-
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 10
contentType: application/vnd.kubernetes.protobuf
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 5
clusterCIDR: ""
configSyncPeriod: 15m0s
conntrack:
max: null
maxPerCore: 32768
min: 131072
tcpCloseWaitTimeout: 1h0m0s
tcpEstablishedTimeout: 24h0m0s
enableProfiling: false
healthzBindAddress: 0.0.0.0:10256
hostnameOverride: ""
iptables:
masqueradeAll: false
masqueradeBit: 14
minSyncPeriod: 0s
syncPeriod: 30s
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: ""
syncPeriod: 30s
kind: KubeProxyConfiguration
metricsBindAddress: 127.0.0.1:10249
mode: ""
nodePortAddresses: null
oomScoreAdj: -999
portRange: ""
resourceContainer: /kube-proxy
udpIdleTimeout: 250ms
winkernel:
enableDSR: false
networkName: ""
sourceVip: ""
kubeconfig.conf: |-
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority:
/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: https://10.0.1.1:6443
name: default
contexts:
- context:
cluster: default
namespace: default
user: default
name: default
current-context: default
users:
- name: default
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
creationTimestamp: "2019-03-21T10:34:03Z"
labels:
app: kube-proxy
name: kube-proxy
namespace: kube-system
resourceVersion: "4458115"
selfLink: /api/v1/namespaces/kube-system/configmaps/kube-proxy
uid: d8a454fb-4bc4-11e9-b0b4-00155d044109
I can see that "user: default" that confuses me... which user is it trying to authenticate with? is there an actual user named "default"?
thank you very much!
output from kubectl get po -n kube-system
$ kubectl get po - n kube-system
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-27qck 0/1 Pending 0 7d15h
coredns-fb8b8dccf-dd6bh 0/1 Pending 0 7d15h
kube-apiserver-fh-ubuntu01 1/1 Running 1 7d15h
kube-controller-manager-fh-ubuntu01 1/1 Running 0 7d15h
kube-proxy-xjxck 1/1 Running 0 43h
kube-scheduler-fh-ubuntu01 1/1 Running 1 7d15h
weave-net-psqh5 1/2 CrashLoopBackOff 2144 7d15h
cluster health look healthy
$ kubectl get cs
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-2 Healthy {"health": "true"}
etcd-3 Healthy {"health": "true"}
etcd-0 Healthy {"health": "true"}
etcd-1 Healthy {"health": "true"}
Run below command to check cluster health
kubectl get cs
Then check status of control plane services
kubectl get po -n kube-system
Issue seems to be with weave-net-psqh5 pod. find out why it was getting into CrashLoop status.
share the logs from weave-net-psqh5.