Some pods not resolving via CoreDNS in k3s - kubernetes

I am running k3s cluster on RPis with Ubuntu Server 20.04 and have a drive mounted onto the master node to serve as shared NFS storage.
For some reason I can only access the server using the ClusterIP of the service, but no the name. CoreDNS is running and the logs suggest that it's working fine, but it's not used by the test pod that I try to mount the NFS share onto.
There is a question for Raspbian case, but I am running Ubuntu, not sure if using legacy iptables as suggested there will work or break things further.
Any help is appreciated! Details below.
NFS server manifests:
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nfs-server-pod
namespace: storage
labels:
nfs-role: server
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
nfs-role: server
template:
metadata:
labels:
nfs-role: server
spec:
nodeSelector:
storage: enabled
containers:
- name: nfs-server-container
image: ghcr.io/two-tauers/nfs-server:0.1.14
securityContext:
privileged: true
args:
- /storage
volumeMounts:
- name: storage-mount
mountPath: /storage
volumes:
- name: storage-mount
hostPath:
path: /storage
---
kind: Service
apiVersion: v1
metadata:
name: nfs-server
namespace: storage
spec:
selector:
nfs-role: server
type: ClusterIP
ports:
- name: tcp-2049
port: 2049
protocol: TCP
- name: udp-111
port: 111
❯ kubectl logs pod/nfs-server-pod-ccd9c4877-lgd4x -n storage
* Exporting directories for NFS kernel daemon...
...done.
* Starting NFS kernel daemon
...done.
Setting up watches.
Watches established.
❯ kubectl get service/nfs-server -n storage
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nfs-server ClusterIP 10.43.223.27 <none> 2049/TCP,111/TCP 102m
Test pod, not in a running state:
---
kind: Pod
apiVersion: v1
metadata:
name: test-nfs
namespace: storage
spec:
volumes:
- name: nfs-volume
nfs:
server: nfs-server # doesn't work
### server: nfs-server.storage.svc.cluster.local # doesn't work
### server: 10.43.223.27 # works
path: /
containers:
- name: test
image: alpine
volumeMounts:
- name: nfs-volume
mountPath: /var/nfs
command: ["/bin/sh"]
args: ["-c", "while true; do date >> /var/nfs/dates.txt; sleep 5; done"]
Events:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 51m default-scheduler Successfully assigned storage/test-nfs to sauron
Warning FailedMount 9m26s (x14 over 48m) kubelet MountVolume.SetUp failed for volume "nfs-volume" : mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t nfs nfs-server:/ /var/lib/kubelet/pods/d1523d5f-60ec-4990-ae61-369a26b7a6f4/volumes/kubernetes.io~nfs/nfs-volume
Output: mount.nfs: Connection timed out
Warning FailedMount 8m12s (x15 over 49m) kubelet Unable to attach or mount volumes: unmounted volumes=[nfs-volume], unattached volumes=[nfs-volume kube-api-access-pc9bk]: timed out waiting for the condition
Warning FailedMount 3m44s (x5 over 46m) kubelet Unable to attach or mount volumes: unmounted volumes=[nfs-volume], unattached volumes=[kube-api-access-pc9bk nfs-volume]: timed out waiting for the condition
At the same time, nslookup works and coredns does produce a lookup log:
---
apiVersion: v1
kind: Pod
metadata:
name: dnsutils
namespace: storage
spec:
containers:
- name: dnsutils
image: k8s.gcr.io/e2e-test-images/jessie-dnsutils:1.3
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
restartPolicy: Always
❯ kubectl exec -i -t dnsutils -n storage -- nslookup nfs-server && kubectl logs pod/coredns-84c56f7bfb-66s84 -n kube-system | tail -1
Server: 10.43.0.10
Address: 10.43.0.10#53
Name: nfs-server.storage.svc.cluster.local
Address: 10.43.223.27
[INFO] 10.42.1.77:37937 - 21822 "A IN nfs-server.storage.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd 106 0.000980753s
❯ kubectl exec -i -t dnsutils -n storage -- nslookup kubernetes.default.svc.cluster.local && kubectl logs pod/coredns-84c56f7bfb-66s84 -n kube-system | tail -1
Server: 10.43.0.10
Address: 10.43.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.43.0.1
[INFO] 10.42.1.77:59077 - 58999 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd 106 0.000496515s

Related

Remove nodeSelectorTerms param

I use this manifest configuration to deploy a registry into 3 mode Kubernetes cluster:
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv1
namespace: registry-space
spec:
capacity:
storage: 5Gi # specify your own size
volumeMode: Filesystem
persistentVolumeReclaimPolicy: Retain
local:
path: /opt/registry # can be any path
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- kubernetes2
accessModes:
- ReadWriteMany # only 1 node will read/write on the path.
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pv1-claim
namespace: registry-space
spec: # should match specs added in the PersistenVolume
accessModes:
- ReadWriteMany
volumeMode: Filesystem
resources:
requests:
storage: 5Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: private-repository-k8s
namespace: registry-space
labels:
app: private-repository-k8s
spec:
replicas: 1
selector:
matchLabels:
app: private-repository-k8s
template:
metadata:
labels:
app: private-repository-k8s
spec:
volumes:
- name: certs-vol
hostPath:
path: /opt/certs
type: Directory
- name: task-pv-storage
persistentVolumeClaim:
claimName: pv1-claim # specify the PVC that you've created. PVC and Deployment must be in same namespace.
containers:
- image: registry:2
name: private-repository-k8s
imagePullPolicy: IfNotPresent
env:
- name: REGISTRY_HTTP_TLS_CERTIFICATE
value: "/opt/certs/registry.crt"
- name: REGISTRY_HTTP_TLS_KEY
value: "/opt/certs/registry.key"
ports:
- containerPort: 5000
volumeMounts:
- name: certs-vol
mountPath: /opt/certs
- name: task-pv-storage
mountPath: /opt/registry
I manually created directories on every node under /opt/certs and /opt/registry.
But when I try to deploy the manifest without hardcoded nodeSelectorTerms on tha control plane I get error:
kubernetes#kubernetes1:/opt/registry$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-58dbc876ff-fsjd5 1/1 Running 1 (74m ago) 84m
kube-system calico-node-5brzt 1/1 Running 1 (73m ago) 84m
kube-system calico-node-nph9n 1/1 Running 1 (76m ago) 84m
kube-system calico-node-pcd74 1/1 Running 1 (74m ago) 84m
kube-system calico-node-ph2ht 1/1 Running 1 (76m ago) 84m
kube-system coredns-565d847f94-7pswp 1/1 Running 1 (74m ago) 105m
kube-system coredns-565d847f94-tlrfr 1/1 Running 1 (74m ago) 105m
kube-system etcd-kubernetes1 1/1 Running 2 (74m ago) 105m
kube-system kube-apiserver-kubernetes1 1/1 Running 2 (74m ago) 105m
kube-system kube-controller-manager-kubernetes1 1/1 Running 2 (74m ago) 105m
kube-system kube-proxy-4slm4 1/1 Running 1 (76m ago) 86m
kube-system kube-proxy-4tnx2 1/1 Running 2 (74m ago) 105m
kube-system kube-proxy-9dgsj 1/1 Running 1 (73m ago) 85m
kube-system kube-proxy-cgr44 1/1 Running 1 (76m ago) 86m
kube-system kube-scheduler-kubernetes1 1/1 Running 2 (74m ago) 105m
registry-space private-repository-k8s-6d5d954b4f-xkmj5 0/1 Pending 0 4m55s
kubernetes#kubernetes1:/opt/registry$
Do you know how I can let Kubernetes to decide where to deploy the pod?
Lets try the following(disregard the paths you currently have and use the ones in the example, (then you can change it), we can adapt it to your needs once dynamic provisioning is working, at the very bottom theres mysql image as an example, use busybox or leave it as it is to get a better understanding:
NFS Server install. Create NFS Share on File Server (Usually master node)
#Include prerequisites
sudo apt update -y # Run updates prior to installing
sudo apt install nfs-kernel-server # Install NFS Server
sudo systemctl enable nfs-server # Set nfs-server to load on startups
sudo systemctl status nfs-server # Check its status
# check server status
root#worker03:/home/brucelee# sudo systemctl status nfs-server
● nfs-server.service - NFS server and services
Loaded: loaded (/lib/systemd/system/nfs-server.service; enabled; vendor preset: enabled)
Active: active (exited) since Fri 2021-08-13 04:25:50 UTC; 18s ago
Process: 2731 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS)
Process: 2732 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=0/SUCCESS)
Main PID: 2732 (code=exited, status=0/SUCCESS)
Aug 13 04:25:49 linux03 systemd[1]: Starting NFS server and services...
Aug 13 04:25:50 linux03 systemd[1]: Finished NFS server and services.
# Prepare an empty folder
sudo su # enter root
nfsShare=/nfs-share
mkdir $nfsShare # create folder if it doesn't exist
chown nobody: $nfsShare
chmod -R 777 $nfsShare # not recommended for production
# Edit the nfs server share configs
vim /etc/exports
# add these lines
/nfs-share x.x.x.x/24(rw,sync,no_subtree_check,no_root_squash,no_all_squash,insecure)
# Export directory and make it available
sudo exportfs -rav
# Verify nfs shares
sudo exportfs -v
# Enable ingress for subnet
sudo ufw allow from x.x.x.x/24 to any port nfs
# Check firewall status - inactive firewall is fine for testing
root#worker03:/home/brucelee# sudo ufw status
Status: inactive
NFS Client install (Worker nodes)
# Install prerequisites
sudo apt update -y
sudo apt install nfs-common
# Mount the nfs share
remoteShare=server.ip.here:/nfs-share
localMount=/mnt/testmount
sudo mkdir -p $localMount
sudo mount $remoteShare $localMount
# Unmount
sudo umount $localMount
Dinamic provisioning and Storage class defaulted
# Pull the source code
workingDirectory=~/nfs-dynamic-provisioner
mkdir $workingDirectory && cd $workingDirectory
git clone https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner
cd nfs-subdir-external-provisioner/deploy
# Deploying the service accounts, accepting defaults
k create -f rbac.yaml
# Editing storage class
vim class.yaml
##############################################
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: managed-nfs-ssd # set this value
provisioner: k8s-sigs.io/nfs-subdir-external-provisioner # or choose another name, must match deployment's env PROVISIONER_NAME'
parameters:
archiveOnDelete: "true" # value of true means retaining data upon pod terminations
allowVolumeExpansion: "true" # this attribute doesn't exist by default
##############################################
# Deploying storage class
k create -f class.yaml
# Sample output
stoic#masternode:~/nfs-dynamic-provisioner/nfs-subdir-external-provisioner/deploy$ k get storageclasses.storage.k8s.io
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
managed-nfs-ssd k8s-sigs.io/nfs-subdir-external-provisioner Delete Immediate false 33s
nfs-class kubernetes.io/nfs Retain Immediate true 193d
nfs-client (default) cluster.local/nfs-subdir-external-provisioner Delete Immediate true 12d
# Example of patching an applied object
kubectl patch storageclass managed-nfs-ssd -p '{"allowVolumeExpansion":true}'
kubectl patch storageclass managed-nfs-ssd -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}' # Set storage class as default
# Editing deployment of dynamic nfs provisioning service pod
vim deployment.yaml
##############################################
apiVersion: apps/v1
kind: Deployment
metadata:
name: nfs-client-provisioner
labels:
app: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: default
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: nfs-client-provisioner
template:
metadata:
labels:
app: nfs-client-provisioner
spec:
serviceAccountName: nfs-client-provisioner
containers:
- name: nfs-client-provisioner
image: k8s.gcr.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
volumeMounts:
- name: nfs-client-root
mountPath: /persistentvolumes
env:
- name: PROVISIONER_NAME
value: k8s-sigs.io/nfs-subdir-external-provisioner
- name: NFS_SERVER
value: X.X.X.X # change this value
- name: NFS_PATH
value: /nfs-share # change this value
volumes:
- name: nfs-client-root
nfs:
server: 192.168.100.93 # change this value
path: /nfs-share # change this value
##############################################
# Creating nfs provisioning service pod
k create -f deployment.yaml
# Troubleshooting: example where the deployment was pending variables to be created by rbac.yaml
stoic#masternode: $ k describe deployments.apps nfs-client-provisioner
Name: nfs-client-provisioner
Namespace: default
CreationTimestamp: Sat, 14 Aug 2021 00:09:24 +0000
Labels: app=nfs-client-provisioner
Annotations: deployment.kubernetes.io/revision: 1
Selector: app=nfs-client-provisioner
Replicas: 1 desired | 0 updated | 0 total | 0 available | 1 unavailable
StrategyType: Recreate
MinReadySeconds: 0
Pod Template:
Labels: app=nfs-client-provisioner
Service Account: nfs-client-provisioner
Containers:
nfs-client-provisioner:
Image: k8s.gcr.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
Port: <none>
Host Port: <none>
Environment:
PROVISIONER_NAME: k8s-sigs.io/nfs-subdir-external-provisioner
NFS_SERVER: X.X.X.X
NFS_PATH: /nfs-share
Mounts:
/persistentvolumes from nfs-client-root (rw)
Volumes:
nfs-client-root:
Type: NFS (an NFS mount that lasts the lifetime of a pod)
Server: X.X.X.X
Path: /nfs-share
ReadOnly: false
Conditions:
Type Status Reason
---- ------ ------
Progressing True NewReplicaSetCreated
Available False MinimumReplicasUnavailable
ReplicaFailure True FailedCreate
OldReplicaSets: <none>
NewReplicaSet: nfs-client-provisioner-7768c6dfb4 (0/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 3m47s deployment-controller Scaled up replica set nfs-client-provisioner-7768c6dfb4 to 1
# Get the default nfs storage class
echo $(kubectl get sc -o=jsonpath='{range .items[?(#.metadata.annotations.storageclass\.kubernetes\.io/is-default-class=="true")]}{#.metadata.name}{"\n"}{end}')
PersistentVolumeClaim (Notice the storageClassName it is the one defined on the previous step)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-persistentvolume-claim
namespace: default
spec:
storageClassName: nfs-client
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
PersistentVolume
It is created dinamically ! confirm if it is here with the correct values running this command:
kubectl get pv -A
Deployment
On your deployment you need two things, volumeMounts (for each container) and volumes (for all containers).
Notice: VolumeMounts->name=data and volumes->name=data because they should match. And claimName is my-persistentvolume-claim which is the same as you PVC.
...
spec:
containers:
- name: mysql
image: mysql:8.0.30
volumeMounts:
- name: data
mountPath: /var/lib/mysql
subPath: mysql
volumes:
- name: data
persistentVolumeClaim:
claimName: my-persistentvolume-claim

Kubernetes metrics-server FailedDiscoveryCheck

was hoping to get a little help, my Google-Fu didnt get me much closer. I'm trying to install the metrics server for my fedora-coreos kubernetes 4 node cluster like so:
kubectl apply -f deploy/kubernetes/
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
serviceaccount/metrics-server created
deployment.apps/metrics-server created
service/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
the service seems to never start
kubectl describe apiservice v1beta1.metrics.k8s.io
Name: v1beta1.metrics.k8s.io
Namespace:
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"apiregistration.k8s.io/v1beta1","kind":"APIService","metadata":{"annotations":{},"name":"v1beta1.metrics.k8s.io"},"spec":{"...
API Version: apiregistration.k8s.io/v1
Kind: APIService
Metadata:
Creation Timestamp: 2020-03-04T16:53:33Z
Resource Version: 1611816
Self Link: /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.metrics.k8s.io
UID: 65d9a56a-c548-4d7e-a647-8ce7a865a266
Spec:
Group: metrics.k8s.io
Group Priority Minimum: 100
Insecure Skip TLS Verify: true
Service:
Name: metrics-server
Namespace: kube-system
Port: 443
Version: v1beta1
Version Priority: 100
Status:
Conditions:
Last Transition Time: 2020-03-04T16:53:33Z
Message: failing or missing response from https://10.3.230.59:443/apis/metrics.k8s.io/v1beta1: bad status from https://10.3.230.59:443/apis/metrics.k8s.io/v1beta1: 403
Reason: FailedDiscoveryCheck
Status: False
Type: Available
Events: <none>
Diagnosing I have found googling around:
kubectl get deploy,svc -n kube-system |egrep metrics-server
deployment.apps/metrics-server 1/1 1 1 8m7s
service/metrics-server ClusterIP 10.3.230.59 <none> 443/TCP 8m7s
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes"
Error from server (ServiceUnavailable): the server is currently unable to handle the request
kubectl get all --all-namespaces | grep -i metrics-server
kube-system pod/metrics-server-75b5d446cd-zj4jm 1/1 Running 0 9m11s
kube-system service/metrics-server ClusterIP 10.3.230.59 <none> 443/TCP 9m11s
kube-system deployment.apps/metrics-server 1/1 1 1 9m11s
kube-system replicaset.apps/metrics-server-75b5d446cd 1 1 1 9m11s
kubectl logs -f metrics-server-75b5d446cd-zj4jm -n kube-system
I0304 16:53:36.475657 1 serving.go:312] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
W0304 16:53:38.229267 1 authentication.go:296] Cluster doesn't provide requestheader-client-ca-file in configmap/extension-apiserver-authentication in kube-system, so request-header client certificate authentication won't work.
I0304 16:53:38.267760 1 secure_serving.go:116] Serving securely on [::]:4443
kubectl get -n kube-system deployment metrics-server -o yaml | grep -i args -A 10
{"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"k8s-app":"metrics-server"},"name":"metrics-server","namespace":"kube-system"},"spec":{"selector":{"matchLabels":{"k8s-app":"metrics-server"}},"template":{"metadata":{"labels":{"k8s-app":"metrics-server"},"name":"metrics-server"},"spec":{"containers":[{"args":["--cert-dir=/tmp","--secure-port=4443","--kubelet-insecure-tls","--kubelet-preferred-address-types=InternalIP"],"image":"k8s.gcr.io/metrics-server-amd64:v0.3.6","imagePullPolicy":"IfNotPresent","name":"metrics-server","ports":[{"containerPort":4443,"name":"main-port","protocol":"TCP"}],"securityContext":{"readOnlyRootFilesystem":true,"runAsNonRoot":true,"runAsUser":1000},"volumeMounts":[{"mountPath":"/tmp","name":"tmp-dir"}]}],"nodeSelector":{"beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64"},"serviceAccountName":"metrics-server","volumes":[{"emptyDir":{},"name":"tmp-dir"}]}}}}
creationTimestamp: "2020-03-04T16:53:33Z"
generation: 1
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
resourceVersion: "1611810"
selfLink: /apis/apps/v1/namespaces/kube-system/deployments/metrics-server
uid: 006e758e-bd33-47d7-8378-d3a8081ee8a8
spec:
--
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
image: k8s.gcr.io/metrics-server-amd64:v0.3.6
imagePullPolicy: IfNotPresent
name: metrics-server
ports:
- containerPort: 4443
name: main-port
finally my deployment config:
spec:
selector:
matchLabels:
k8s-app: metrics-server
template:
metadata:
name: metrics-server
labels:
k8s-app: metrics-server
spec:
serviceAccountName: metrics-server
volumes:
# mount in tmp so we can safely use from-scratch images and/or read-only containers
- name: tmp-dir
emptyDir: {}
containers:
- name: metrics-server
image: k8s.gcr.io/metrics-server-amd64:v0.3.6
command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
ports:
- name: main-port
containerPort: 4443
protocol: TCP
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
imagePullPolicy: IfNotPresent
volumeMounts:
- name: tmp-dir
mountPath: /tmp
hostNetwork: true
nodeSelector:
beta.kubernetes.io/os: linux
kubernetes.io/arch: "amd64"
I'm at a loss of what it could be getting the metrics service to start and just get the basic kubectl top node to display any info all I get is
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods.metrics.k8s.io)
I have searched the internet and tried adding the args: and command: lines but no luck
command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
Can anyone shed light on how to fix this? Thanks
Pastebin log file
Log File
I've reproduced your issue. I have used Calico as CNI.
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
fedora-master Ready master 6m27s v1.17.3
fedora-worker-1 Ready <none> 4m48s v1.17.3
fedora-worker-2 Ready <none> 4m46s v1.17.3
fedora-master:~/metrics-server$ kubectl describe apiservice v1beta1.metrics.k8s.io
Status:
Conditions:
Last Transition Time: 2020-03-12T16:04:59Z
Message: failing or missing response from https://10.99.122.196:443/apis/metrics.k8s.io/v
1beta1: Get https://10.99.122.196:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting
for connection (Client.Timeout exceeded while awaiting headers)
fedora-master:~/metrics-server$ kubectl top pod
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods.metrics.k8s.io)
When you have only one node in cluster, default settings in metrics-server repo works correctly. Issue occurs when you have more than 2 nodes. Ive used 1 master and 2 workers to reproduce. Below example deployment which works correct (have all required args). Before, please remove your current metrics-server YAMLs (kubectl delete -f deploy/kubernetes) and execute:
$ git clone https://github.com/kubernetes-sigs/metrics-server
$ cd metrics-server/deploy/kubernetes/
$ vi metrics-server-deployment.yaml
Paste below YAML:
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-server
namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: metrics-server
namespace: kube-system
labels:
k8s-app: metrics-server
spec:
selector:
matchLabels:
k8s-app: metrics-server
template:
metadata:
name: metrics-server
labels:
k8s-app: metrics-server
spec:
serviceAccountName: metrics-server
volumes:
# mount in tmp so we can safely use from-scratch images and/or read-only containers
- name: tmp-dir
emptyDir: {}
hostNetwork: true
containers:
- name: metrics-server
image: k8s.gcr.io/metrics-server-amd64:v0.3.6
imagePullPolicy: IfNotPresent
args:
- /metrics-server
- --kubelet-preferred-address-types=InternalIP
- --kubelet-insecure-tls
- --cert-dir=/tmp
- --secure-port=4443
ports:
- name: main-port
containerPort: 4443
protocol: TCP
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- name: tmp-dir
mountPath: /tmp
nodeSelector:
kubernetes.io/os: linux
kubernetes.io/arch: "amd64"
save and quit using :wq
$ cd ~/metrics-server
$ kubectl apply -f deploy/kubernetes/
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
serviceaccount/metrics-server created
deployment.apps/metrics-server created
service/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
Wait a while for metrics-server to gather a few metrics from nodes.
$ kubectl describe apiservice v1beta1.metrics.k8s.io
Name: v1beta1.metrics.k8s.io
Namespace:
...
Metadata:
Creation Timestamp: 2020-03-12T16:57:58Z
...
Spec:
Group: metrics.k8s.io
Group Priority Minimum: 100
Insecure Skip TLS Verify: true
Service:
Name: metrics-server
Namespace: kube-system
Port: 443
Version: v1beta1
Version Priority: 100
Status:
Conditions:
Last Transition Time: 2020-03-12T16:58:01Z
Message: all checks passed
Reason: Passed
Status: True
Type: Available
Events: <none>
after a few minutes you can use top.
$ kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
fedora-master 188m 9% 1315Mi 17%
fedora-worker-1 109m 5% 982Mi 13%
fedora-worker-2 84m 4% 969Mi 13%
If you will still encounter some issues, please add - --v=6 to deployment and provide logs from metrics-server pod.
containers:
- name: metrics-server
image: k8s.gcr.io/metrics-server-amd64:v0.3.1
args:
- /metrics-server
- --v=6
- --kubelet-preferred-address-types=InternalIP
- --kubelet-insecure-tls
You need to carefully check logs for calico-node pods. In my case i have some other network interfaces and the autodetection mechanism in calico was detecting wrong interface (ip address). You need to consult this documentation https://projectcalico.docs.tigera.io/reference/node/configuration.
What i did in my case, was simply:
kubectl set env daemonset/calico-node -n kube-system IP_AUTODETECTION_METHOD=cidr=172.16.8.0/24
cidr is my "working network". After this, all calico-nodes restarted and suddenly everything was fine.

Error: failed to prepare subPath for volumeMount "postgres-storage" of container "postgres"

I am trying to use persistent volume claims and facing this issue
This is my postgres-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres-deployment
spec:
replicas: 1
selector:
matchLabels:
component: postgres
template:
metadata:
labels:
component: postgres
spec:
volumes:
- name: postgres-storage
persistentVolumeClaim:
claimName: database-persistent-volume-claim
containers:
- name: postgres
image: postgres
ports:
- containerPort: 5432
volumeMounts:
- mountPath: /var/lib/postgresql/data
name: postgres-storage
subPath: postgres
when i debug pod using describe
kubectl describe pod postgres-deployment-8576df7bfc-8mp5t
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m4s default-scheduler Successfully assigned default/postgres-deployment-8576df7bfc-8mp5t to docker-desktop
Normal Pulled 67s (x8 over 2m58s) kubelet, docker-desktop Successfully pulled image "postgres"
Warning Failed 67s (x8 over 2m58s) kubelet, docker-desktop Error: failed to prepare subPath for volumeMount "postgres-storage" of container "postgres"
Normal Pulling 53s (x9 over 3m3s) kubelet, docker-desktop Pulling image "postgres"
My pod is showing me this error
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
postgres-deployment-8576df7bfc-8mp5t 0/1 CreateContainerConfigError 0 5m5
I am not sure where is the problem in the config. but I am sure this is related to volumes because after adding volumes this problem appears
remove subpath. can you try below yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres-deployment
spec:
replicas: 1
selector:
matchLabels:
component: postgres
template:
metadata:
labels:
component: postgres
spec:
volumes:
- name: postgres-storage
persistentVolumeClaim:
claimName: database-persistent-volume-claim
containers:
- name: postgres
image: postgres
ports:
- containerPort: 5432
volumeMounts:
- mountPath: /var/lib/postgresql/data
name: postgres-storage
I just deployed and it works
master $ kubectl get deploy
NAME READY UP-TO-DATE AVAILABLE AGE
postgres-deployment 1/1 1 1 4m13s
master $ kubectl get po
NAME READY STATUS RESTARTS AGE
postgres-deployment-6b66bdd748-5q76h 1/1 Running 0 4m13s

rabbitmq kubernetes with NFS mount

I tried to set up a rabbitmq cluster in a kubernetes envirnoment that has NFS PVs with the help of this tutorial. Unfortunately it seems like the rabbitmq wants to change the owner of /usr/lib/rabbitmq, but when I have a NFS directory mounted there, I get an error:
$ kubectl logs rabbitmq-0 -f
chown: /var/lib/rabbitmq: Operation not permitted
chown: /var/lib/rabbitmq: Operation not permitted
I guess I have two options: fork the rabbitmq and remove the chown and build my own images or make kubernetes/nfs work nicely. I would not like to make my own fork and getting kubernetes/nfs working nicely does not sound like it should be my problem. Any other ideas?
This is what i tried to reproduce this issue.
I was installed kubernetes cluster using kubeadm on redhat 7 and below is the cluster ,node details
ENVIRONMENT DETAILS:
[root#master tmp]# kubectl cluster-info
Kubernetes master is running at https://192.168.56.4:6443
KubeDNS is running at https://192.168.56.4:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
[root#master tmp]#
[root#master tmp]# kubectl get no
NAME STATUS ROLES AGE VERSION
master.k8s Ready master 8d v1.16.2
node1.k8s Ready <none> 7d22h v1.16.3
node2.k8s Ready <none> 7d21h v1.16.3
[root#master tmp]#
First i have set the nfs configuration on both master and worker nodes by running below steps on both master and worker nodes.here master node is nfs server and both worker nodes are nfs clients
NFS SETUP:
yum install nfs-utils nfs-utils-lib =============================================================>>>>> on nfs server,client
yum install portmap =============================================================>>>>> on nfs server,client
mkdir /nfsroot =============================>>>>>>>>>>>>>>>>>>on nfs server
[root#master ~]# cat /etc/exports =============================================================>>>>> on nfs server
/nfsroot 192.168.56.5/255.255.255.0(rw,sync,no_root_squash)
/nfsroot 192.168.56.6/255.255.255.0(rw,sync,no_root_squash)
exportfs -r =============================================================>>>>> on nfs server
service nfs start =============================================================>>>>> on nfs server,client
showmount -e =============================================================>>>>> on nfs server,client
Now nfs setup is ready and will apply rabbitmq k8s setup
RABBITMQ K8S SETUP:
First step is to create persistent volumes using the nfs mount which we created in above step
[root#master tmp]# cat /root/rabbitmq-pv.yaml
kind: PersistentVolume
apiVersion: v1
metadata:
name: rabbitmq-pv-1
spec:
accessModes:
- ReadWriteOnce
- ReadOnlyMany
nfs:
server: 192.168.56.4
path: /nfsroot
capacity:
storage: 1Mi
persistentVolumeReclaimPolicy: Recycle
---
kind: PersistentVolume
apiVersion: v1
metadata:
name: rabbitmq-pv-2
spec:
accessModes:
- ReadWriteOnce
- ReadOnlyMany
nfs:
server: 192.168.56.4
path: /nfsroot
capacity:
storage: 1Mi
persistentVolumeReclaimPolicy: Recycle
---
kind: PersistentVolume
apiVersion: v1
metadata:
name: rabbitmq-pv-3
spec:
accessModes:
- ReadWriteOnce
- ReadOnlyMany
nfs:
server: 192.168.56.4
path: /nfsroot
capacity:
storage: 1Mi
persistentVolumeReclaimPolicy: Recycle
---
kind: PersistentVolume
apiVersion: v1
metadata:
name: rabbitmq-pv-4
spec:
accessModes:
- ReadWriteOnce
- ReadOnlyMany
nfs:
server: 192.168.56.4
path: /nfsroot
capacity:
storage: 1Mi
persistentVolumeReclaimPolicy: Recycle
After applied the above manifest ,it created pv's as below
[root#master ~]# kubectl apply -f rabbitmq-pv.yaml
persistentvolume/rabbitmq-pv-1 created
persistentvolume/rabbitmq-pv-2 created
persistentvolume/rabbitmq-pv-3 created
persistentvolume/rabbitmq-pv-4 created
[root#master ~]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
rabbitmq-pv-1 1Mi RWO,ROX Recycle Available 5s
rabbitmq-pv-2 1Mi RWO,ROX Recycle Available 5s
rabbitmq-pv-3 1Mi RWO,ROX Recycle Available 5s
rabbitmq-pv-4 1Mi RWO,ROX Recycle Available 5s
[root#master ~]#
No need to create persistentvolumeclaim ,since it will be automatically taken care while running statefulset manifest by volumeclaimtemplate option
now lets create the secret which you have mentioned as below
[root#master tmp]# kubectl create secret generic rabbitmq-config --from-literal=erlang-cookie=c-is-for-cookie-thats-good-enough-for-me
secret/rabbitmq-config created
[root#master tmp]#
[root#master tmp]# kubectl get secrets
NAME TYPE DATA AGE
default-token-vjsmd kubernetes.io/service-account-token 3 8d
jp-token-cfdzx kubernetes.io/service-account-token 3 5d2h
rabbitmq-config Opaque 1 39m
[root#master tmp]#
Now let submit your rabbitmq manifest by make changes of replacing all loadbalancer service type to nodeport service,since we are not using any cloudprovider environment.Also replace the volume names to rabbitmq-pv,which we have created in pv step.reduced the size from 1Gi to 1Mi,since it is just testing demo
apiVersion: v1
kind: Service
metadata:
# Expose the management HTTP port on each node
name: rabbitmq-management
labels:
app: rabbitmq
spec:
ports:
- port: 15672
name: http
selector:
app: rabbitmq
sessionAffinity: ClientIP
type: NodePort
---
apiVersion: v1
kind: Service
metadata:
# The required headless service for StatefulSets
name: rabbitmq
labels:
app: rabbitmq
spec:
ports:
- port: 5672
name: amqp
- port: 4369
name: epmd
- port: 25672
name: rabbitmq-dist
clusterIP: None
selector:
app: rabbitmq
---
apiVersion: v1
kind: Service
metadata:
# The required headless service for StatefulSets
name: rabbitmq-cluster
labels:
app: rabbitmq
spec:
ports:
- port: 5672
name: amqp
- port: 4369
name: epmd
- port: 25672
name: rabbitmq-dist
type: NodePort
selector:
app: rabbitmq
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: rabbitmq
spec:
serviceName: "rabbitmq"
selector:
matchLabels:
app: rabbitmq
replicas: 4
template:
metadata:
labels:
app: rabbitmq
spec:
terminationGracePeriodSeconds: 10
containers:
- name: rabbitmq
image: rabbitmq:3.6.6-management-alpine
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- >
if [ -z "$(grep rabbitmq /etc/resolv.conf)" ]; then
sed "s/^search \([^ ]\+\)/search rabbitmq.\1 \1/" /etc/resolv.conf > /etc/resolv.conf.new;
cat /etc/resolv.conf.new > /etc/resolv.conf;
rm /etc/resolv.conf.new;
fi;
until rabbitmqctl node_health_check; do sleep 1; done;
if [[ "$HOSTNAME" != "rabbitmq-0" && -z "$(rabbitmqctl cluster_status | grep rabbitmq-0)" ]]; then
rabbitmqctl stop_app;
rabbitmqctl join_cluster rabbit#rabbitmq-0;
rabbitmqctl start_app;
fi;
rabbitmqctl set_policy ha-all "." '{"ha-mode":"exactly","ha-params":3,"ha-sync-mode":"automatic"}'
env:
- name: RABBITMQ_ERLANG_COOKIE
valueFrom:
secretKeyRef:
name: rabbitmq-config
key: erlang-cookie
ports:
- containerPort: 5672
name: amqp
- containerPort: 25672
name: rabbitmq-dist
volumeMounts:
- name: rabbitmq-pv
mountPath: /var/lib/rabbitmq
volumeClaimTemplates:
- metadata:
name: rabbitmq-pv
annotations:
volume.alpha.kubernetes.io/storage-class: default
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Mi # make this bigger in production
After submitted the pod manifest,able to see statefulsets,pods are created
[root#master tmp]# kubectl apply -f rabbitmq.yaml
service/rabbitmq-management created
service/rabbitmq created
service/rabbitmq-cluster created
statefulset.apps/rabbitmq created
[root#master tmp]#
NAME READY STATUS RESTARTS AGE
rabbitmq-0 1/1 Running 0 18m
rabbitmq-1 1/1 Running 0 17m
rabbitmq-2 1/1 Running 0 13m
rabbitmq-3 1/1 Running 0 13m
[root#master ~]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
rabbitmq-pv-rabbitmq-0 Bound rabbitmq-pv-1 1Mi RWO,ROX 49m
rabbitmq-pv-rabbitmq-1 Bound rabbitmq-pv-3 1Mi RWO,ROX 48m
rabbitmq-pv-rabbitmq-2 Bound rabbitmq-pv-2 1Mi RWO,ROX 44m
rabbitmq-pv-rabbitmq-3 Bound rabbitmq-pv-4 1Mi RWO,ROX 43m
[root#master ~]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
rabbitmq ClusterIP None <none> 5672/TCP,4369/TCP,25672/TCP 49m
rabbitmq-cluster NodePort 10.102.250.172 <none> 5672:30574/TCP,4369:31757/TCP,25672:31854/TCP 49m
rabbitmq-management NodePort 10.108.131.46 <none> 15672:31716/TCP 49m
[root#master ~]#
Now i tried to hit the rabbitmq management page using nodeport service by http://192.168.56.6://31716 and able to get the login page
So please let me know if you still face chown issue after you tried like above,so that we can see further by checking podsecuritypolicies applied or not

Why is my persistent disk failing to mount to my pod?

I've been trying to find out why my pod won't start up, when I go to describe it I get this:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m default-scheduler Successfully assigned my-namespace/nfs to gke-default-pool
Normal SuccessfulAttachVolume 2m attachdetach-controller AttachVolume.Attach succeeded for volume "nfspvc"
Warning FailedMount 50s kubelet, default-pool Unable to mount volumes for pod "nfs-8496cc5fd5-wjkm2_sxdb-branch161666(28ff323d-8839-11e9-a080-42010a8400bb)": timeout expired waiting for volumes to attach or mount for pod "my-namespace"/"nfs-8496cc5fd5-wjkm2". list of unmounted volumes=[nfspvc]. list of unattached volumes=[nfspvc default-token-ntmfv]
Warning FailedMount 28s (x9 over 2m) kubelet, default-pool MountVolume.MountDevice failed for volume "nfspvc" : executable file not found in $PATH
When I describe the compute disk, all looks good:
creationTimestamp: '2019-06-06T01:56:31.079-07:00'
id: '5701286856735681489'
kind: compute#disk
labelFingerprint: 42WmSpB8rSM=
lastAttachTimestamp: '2019-06-06T01:57:51.852-07:00'
name: nfs-pd
physicalBlockSizeBytes: '4096'
selfLink: <omitted>
sizeGb: '10'
status: READY
type: <omitted>
users:
- <omitted>
zone: <omitted>
Updates are available for some Cloud SDK components. To install them,
please run:
$ gcloud components update
And here is my pods manifest:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
role: nfs
name: nfs
namespace: my-namespace
spec:
replicas: 1
selector:
matchLabels:
role: nfs
spec:
containers:
- image: gcr.io/google_containers/volume-nfs:0.8
name: nfs
ports:
- containerPort: 2049
name: nfs
protocol: TCP
- containerPort: 20048
name: mountd
protocol: TCP
- containerPort: 111
name: rpcbind
protocol: TCP
volumeMounts:
- mountPath: /exports
name: nfspvc
restartPolicy: Always
volumes:
- gcePersistentDisk:
fsType: ext4i
pdName: nfs-pd
name: nfspvc
I'm not really sure what 'MountVolume.MountDevice failed for volume "nfspvc" : executable file not found in $PATH' means, or what else I should look in to, to investigate the source of the issue?
If it makes any difference, this is created by a script, the ordering of which is:
Create the compute disk
Run helm, which in turn creates the above deployment, service, persistent volume, and persistent volume claim.
fsType: ext4i
Try removing that 'i'.