Kubernetes: Failed to get GCE GCECloudProvider with error <nil> - kubernetes

I have set up a custom kubernetes cluster on GCE using kubeadm. I am trying to use StatefulSets with persistent storage.
I have the following configuration:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: gce-slow
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-standard
zones: europe-west3-b
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: myname
labels:
app: myapp
spec:
serviceName: myservice
replicas: 1
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: mycontainer
image: ubuntu:16.04
env:
volumeMounts:
- name: myapp-data
mountPath: /srv/data
imagePullSecrets:
- name: sitesearch-secret
volumeClaimTemplates:
- metadata:
name: myapp-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: gce-slow
resources:
requests:
storage: 1Gi
And I get the following error:
Nopx#vm0:~$ kubectl describe pvc
Name: myapp-data-myname-0
Namespace: default
StorageClass: gce-slow
Status: Pending
Volume:
Labels: app=myapp
Annotations: volume.beta.kubernetes.io/storage-provisioner=kubernetes.io/gce-pd
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ProvisioningFailed 5s persistentvolume-controller Failed to provision volume
with StorageClass "gce-slow": Failed to get GCE GCECloudProvider with error <nil>
I am treading in the dark and do not know what is missing. It seems logical that it doesn't work, since the provisioner never authenticates to GCE. Any hints and pointers are very much appreciated.
EDIT
I Tried the solution here, by editing the config file in kubeadm with kubeadm config upload from-file, however the error persists. The kubadm config looks like this right now:
api:
advertiseAddress: 10.156.0.2
bindPort: 6443
controlPlaneEndpoint: ""
auditPolicy:
logDir: /var/log/kubernetes/audit
logMaxAge: 2
path: ""
authorizationModes:
- Node
- RBAC
certificatesDir: /etc/kubernetes/pki
cloudProvider: gce
criSocket: /var/run/dockershim.sock
etcd:
caFile: ""
certFile: ""
dataDir: /var/lib/etcd
endpoints: null
image: ""
keyFile: ""
imageRepository: k8s.gcr.io
kubeProxy:
config:
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 10
contentType: application/vnd.kubernetes.protobuf
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 5
clusterCIDR: 192.168.0.0/16
configSyncPeriod: 15m0s
conntrack:
max: null
maxPerCore: 32768
min: 131072
tcpCloseWaitTimeout: 1h0m0s
tcpEstablishedTimeout: 24h0m0s
enableProfiling: false
healthzBindAddress: 0.0.0.0:10256
hostnameOverride: ""
iptables:
masqueradeAll: false
masqueradeBit: 14
minSyncPeriod: 0s
syncPeriod: 30s
ipvs:
minSyncPeriod: 0s
scheduler: ""
syncPeriod: 30s
metricsBindAddress: 127.0.0.1:10249
mode: ""
nodePortAddresses: null
oomScoreAdj: -999
portRange: ""
resourceContainer: /kube-proxy
udpIdleTimeout: 250ms
kubeletConfiguration: {}
kubernetesVersion: v1.10.2
networking:
dnsDomain: cluster.local
podSubnet: 192.168.0.0/16
serviceSubnet: 10.96.0.0/12
nodeName: mynode
privilegedPods: false
token: ""
tokenGroups:
- system:bootstrappers:kubeadm:default-node-token
tokenTTL: 24h0m0s
tokenUsages:
- signing
- authentication
unifiedControlPlaneImage: ""
Edit
The issue was resolved in the comments thanks to Anton Kostenko. The last edit coupled with kubeadm upgrade solves the problem.

The answer took me a while but here it is:
Using the GCECloudProvider in Kubernetes outside of the Google Kubernetes Engine has the following prerequisites (the last point is Kubeadm specific):
The VM needs to be run with a service account that has the right to provision disks. Info on how to run a VM with a service account can be found here
The Kubelet needs to run with the argument --cloud-provider=gce. For this the KUBELET_KUBECONFIG_ARGS in /etc/systemd/system/kubelet.service.d/10-kubeadm.conf have to be edited. The Kubelet can then be restarted with
sudo systemctl restart kubelet
The Kubernetes cloud-config file needs to be configured. The file can be found at /etc/kubernetes/cloud-config and the following content is enough to get the cloud provider to work:
[Global]
project-id = "<google-project-id>"
Kubeadm needs to have GCE configured as its cloud provider. The config posted in the question works fine for this. However, the nodeName has to be changed.

Create dynamic persistent volumes in Kubernetes nodes in the Google cloud virtual machine.
GCP role:
google cloud console go to IAM & Admin.
Add a new service account e.g gce-user.
Add role "compute instance admin".
Add the role to GCP VM:
stop the instance and click edit.
click service account and select new account e.g gce-user.
start the virtual machine.
Add GCE parameter in kubelet in all nodes.
add "--cloud-provider=gce"
sudo vi /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
add the value:
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --cloud-provider=gce"
create new file /etc/kubernetes/cloud-config in all nodes
add this param.
[Global]
project-id = "xxxxxxxxxxxx"
restart kubelet
Add gce in controller-master
vi /etc/kubernetes/manifests
add this params under commands:
--cloud-provider=gce
then restart the control plane.
run the ps -ef |grep controller then you must see "gce" in controller output.
Note: Above method is not recommended on the production system, use kubeadm config to update the controller-manager settings.

Related

kubernetes aws-efs-csi-driver and permissions

I'm using bitnami/etcd chart and it has ability to create snapshots via EFS mounted pvc.
However I get permission error after aws-efs-csi-driver is provisioned and PVC mounted to any non-root pod (user/gid is 1001)
I'm using helm chart https://kubernetes-sigs.github.io/aws-efs-csi-driver/ version 2.2.0
values of the chart:
# you can obtain the fileSystemId with
# aws efs describe-file-systems --query "FileSystems[*].FileSystemId"
storageClasses:
- name: efs
parameters:
fileSystemId: fs-exxxxxxx
directoryPerms: "777"
gidRangeStart: "1000"
gidRangeEnd: "2000"
basePath: "/snapshots"
# enable it after the following issue is resolved
# https://github.com/bitnami/charts/issues/7769
# node:
# nodeSelector:
# etcd: "true"
I then manually created the PV
apiVersion: v1
kind: PersistentVolume
metadata:
name: etcd-snapshotter-pv
annotations:
argocd.argoproj.io/sync-wave: "60"
spec:
capacity:
storage: 32Gi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: efs
csi:
driver: efs.csi.aws.com
volumeHandle: fs-exxxxxxx
Then if I mount that EFS PVC in non-rood pod I get the following error
➜ klo etcd-snapshotter-001-ph8w9
etcd 23:18:38.76 DEBUG ==> Using endpoint etcd-snapshotter-001-ph8w9:2379
{"level":"warn","ts":1633994320.7789018,"logger":"client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0005ea380/#initially=[etcd-snapshotter-001-ph8w9:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.120.2.206:2379: connect: connection refused\""}
etcd-snapshotter-001-ph8w9:2379 is unhealthy: failed to commit proposal: context deadline exceeded
Error: unhealthy cluster
etcd 23:18:40.78 WARN ==> etcd endpoint etcd-snapshotter-001-ph8w9:2379 not healthy. Trying a different endpoint
etcd 23:18:40.78 DEBUG ==> Using endpoint etcd-2.etcd-headless.etcd.svc.cluster.local:2379
etcd-2.etcd-headless.etcd.svc.cluster.local:2379 is healthy: successfully committed proposal: took = 1.6312ms
etcd 23:18:40.87 INFO ==> Snapshotting the keyspace
Error: could not open /snapshots/db-2021-10-11_23-18.part (open /snapshots/db-2021-10-11_23-18.part: permission denied)
As a result I have to spawn a new "root" pod, get inside the pod and manually adjust the permissions
apiVersion: v1
kind: Pod
metadata:
name: perm
spec:
securityContext:
runAsUser: 0
runAsGroup: 0
fsGroup: 0
containers:
- name: app1
image: busybox
command: ["/bin/sh"]
args: ["-c", "sleep 3000"]
volumeMounts:
- name: persistent-storage
mountPath: /snapshots
securityContext:
runAsUser: 0
runAsGroup: 0
volumes:
- name: persistent-storage
persistentVolumeClaim:
claimName: etcd-snapshotter
nodeSelector:
etcd: "true"
k apply -f setup.yaml
k exec -ti perm -- ash
cd /snapshots
/snapshots # chown -R 1001.1001 .
/snapshots # chmod -R 777 .
/snapshots # exit
➜ k create job --from=cronjob/etcd-snapshotter etcd-snapshotter-001
job.batch/etcd-snapshotter-001 created
➜ klo etcd-snapshotter-001-bmv79
etcd 23:31:10.22 DEBUG ==> Using endpoint etcd-1.etcd-headless.etcd.svc.cluster.local:2379
etcd-1.etcd-headless.etcd.svc.cluster.local:2379 is healthy: successfully committed proposal: took = 2.258532ms
etcd 23:31:10.32 INFO ==> Snapshotting the keyspace
{"level":"info","ts":1633995070.4244702,"caller":"snapshot/v3_snapshot.go:68","msg":"created temporary db file","path":"/snapshots/db-2021-10-11_23-31.part"}
{"level":"info","ts":1633995070.4907935,"logger":"client","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":1633995070.4908395,"caller":"snapshot/v3_snapshot.go:76","msg":"fetching snapshot","endpoint":"etcd-1.etcd-headless.etcd.svc.cluster.local:2379"}
{"level":"info","ts":1633995070.4965465,"logger":"client","caller":"v3/maintenance.go:219","msg":"completed snapshot read; closing"}
{"level":"info","ts":1633995070.544217,"caller":"snapshot/v3_snapshot.go:91","msg":"fetched snapshot","endpoint":"etcd-1.etcd-headless.etcd.svc.cluster.local:2379","size":"320 kB","took":"now"}
{"level":"info","ts":1633995070.5507936,"caller":"snapshot/v3_snapshot.go:100","msg":"saved","path":"/snapshots/db-2021-10-11_23-31"}
Snapshot saved at /snapshots/db-2021-10-11_23-31
➜ k exec -ti perm -- ls -la /snapshots
total 924
drwxrwxrwx 2 1001 1001 6144 Oct 11 23:31 .
drwxr-xr-x 1 root root 46 Oct 11 23:25 ..
-rw------- 1 1001 root 319520 Oct 11 23:31 db-2021-10-11_23-31
Is there a way to automate this?
I have this setting in storage class
gidRangeStart: "1000"
gidRangeEnd: "2000"
but it has no effect.
PVC is defined as:
➜ kg pvc etcd-snapshotter -o yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: efs.csi.aws.com
name: etcd-snapshotter
namespace: etcd
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 32Gi
storageClassName: efs
volumeMode: Filesystem
volumeName: etcd-snapshotter-pv
By default the StorageClass field provisioningMode is unset, please set it to provisioningMode: "efs-ap" to enable dynamic provision with access point.

Creating a link to an NFS share in K3s Kubernetes

I'm very new to Kubernetes, and trying to get node-red running on a small cluster of raspberry pi's
I happily managed that, but noticed that once the cluster is powered down, next time I bring it up, the flows in node-red have vanished.
So, I've create a NFS share on a freenas box on my local network and can mount it from another RPI, so I know the permissions work.
However I cannot get my mount to work in a kubernetes deployment.
Any help as to where I have gone wrong please?
apiVersion: apps/v1
kind: Deployment
metadata:
name: node-red
labels:
app: node-red
spec:
replicas: 1
selector:
matchLabels:
app: node-red
template:
metadata:
labels:
app: node-red
spec:
containers:
- name: node-red
image: nodered/node-red:latest
ports:
- containerPort: 1880
name: node-red-ui
securityContext:
privileged: true
volumeMounts:
- name: node-red-data
mountPath: /data
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: TZ
value: Europe/London
volumes:
- name: node-red-data
nfs:
server: 192.168.1.96
path: /mnt/Pool1/ClusterStore/nodered
The error I am getting is
error: error validating "node-red-deploy.yml": error validating data:
ValidationError(Deployment.spec.template.spec): unknown field "nfs" in io.k8s.api.core.v1.PodSpec; if
you choose to ignore these errors, turn validation off with --validate=false
New Information
I now have the following
apiVersion: v1
kind: PersistentVolume
metadata:
name: clusterstore-nodered
labels:
type: nfs
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteMany
nfs:
path: /mnt/Pool1/ClusterStore/nodered
server: 192.168.1.96
persistentVolumeReclaimPolicy: Recycle
claim.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: clusterstore-nodered-claim
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
Now when I start the deployment it waits at pending forever and I see the following the the events for the PVC
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal WaitForFirstConsumer 5m47s (x7 over 7m3s) persistentvolume-controller waiting for first consumer to be created before binding
Normal Provisioning 119s (x5 over 5m44s) rancher.io/local-path_local-path-provisioner-58fb86bdfd-rtcls_506528ac-afd0-11ea-930d-52d0b85bb2c2 External provisioner is provisioning volume for claim "default/clusterstore-nodered-claim"
Warning ProvisioningFailed 119s (x5 over 5m44s) rancher.io/local-path_local-path-provisioner-58fb86bdfd-rtcls_506528ac-afd0-11ea-930d-52d0b85bb2c2 failed to provision volume with StorageClass "local-path": Only support ReadWriteOnce access mode
Normal ExternalProvisioning 92s (x19 over 5m44s) persistentvolume-controller
waiting for a volume to be created, either by external provisioner "rancher.io/local-path" or manually created by system administrator
I assume that this is becuase I don't have a nfs provider, in fact if I do kubectl get storageclass I only see local-path
New question, how do I a add a storageclass for NFS? A little googleing around has left me without a clue.
Ok, solved the issue. Kubernetes tutorials are really esoteric and missing lots of assumed steps.
My problem was down to k3s on the pi only shipping with a local-path storage provider.
I finally found a tutorial that installed an nfs client storage provider, and now my cluster works!
This was the tutorial I found the information in.
In the stated Tutorial there are basically these steps to fulfill:
1.
showmount -e 192.168.1.XY
to check if the share is reachable from outside the NAS
2.
helm install nfs-provisioner stable/nfs-client-provisioner --set nfs.server=192.168.1.**XY** --set nfs.path=/samplevolume/k3s --set image.repository=quay.io/external_storage/nfs-client-provisioner-arm
Whereas you replace the IP with your NFS Server and the NFS path with your specific Path on your synology (both should be visible from your showmount -e IP command
Update 23.02.2021
It seems that you have to use another Chart and Image too:
helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner --set nfs.server=192.168.1.**XY** --set nfs.path=/samplevolume/k3s --set image.repository=gcr.io/k8s-staging-sig-storage/nfs-subdir-external-provisioner
kubectl get storageclass
To check if the storageclass now exists
4.
kubectl patch storageclass nfs-client -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}' && kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
To configure the new Storage class as "default". Replace nfs-client and local-path with what kubectl get storageclass tells
5.
kubectl get storageclass
Final check, if it's marked as "default"
This is a validation error pointing at the very last part of your Deployment yaml, therefore making it an invalid object. It looks like you've made a mistake with indentations. It should look more like this:
volumes:
- name: node-red-data
nfs:
server: 192.168.1.96
path: /mnt/Pool1/ClusterStore/nodered
Also, as you are new to Kubernetes, I strongly recommend getting familiar with the concepts of PersistentVolumes and its claims. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual Pod that uses the PV.
Please let me know if that helped.

Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.10.0.10": field is immutable

I setup my cluster by kubeadm. At the last step i exec kubeadm init --config kubeadm.conf --v=5. I get the error about the clusterIp value. Here is the part of the output:
I0220 00:16:27.625920 31630 clusterinfo.go:79] creating the RBAC rules for exposing the cluster-info ConfigMap in the kube-public namespace
I0220 00:16:27.947941 31630 kubeletfinalize.go:88] [kubelet-finalize] Assuming that kubelet client certificate rotation is enabled: found "/var/lib/kubelet/pki/kubelet-client-current.pem"
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
I0220 00:16:27.949398 31630 kubeletfinalize.go:132] [kubelet-finalize] Restarting the kubelet to enable client certificate rotation
[addons]: Migrating CoreDNS Corefile
I0220 00:16:28.447420 31630 dns.go:381] the CoreDNS configuration has been migrated and applied: .:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
.
I0220 00:16:28.447465 31630 dns.go:382] the old migration has been saved in the CoreDNS ConfigMap under the name [Corefile-backup]
I0220 00:16:28.447486 31630 dns.go:383] The changes in the new CoreDNS Configuration are as follows:
Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.10.0.10": field is immutable
unable to create/update the DNS service
k8s.io/kubernetes/cmd/kubeadm/app/phases/addons/dns.createDNSService
/workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/phases/addons/dns/dns.go:323
k8s.io/kubernetes/cmd/kubeadm/app/phases/addons/dns.createCoreDNSAddon
/workspace/anago-v1.17.0-rc.2.10+70132b0f130acc/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/phases/addons/dns/dns.go:305
k8s.io/kubernetes/cmd/kubeadm/app/phases/addons/dns.coreDNSAddon
And my config file like this:
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 172.16.5.151
bindPort: 6443
nodeRegistration:
criSocket: /var/run/dockershim.sock
name: master02
# taints:
# - effect: NoSchedule
# key: node-role.kubernetes.io/master
---
apiServer:
certSANs:
- "172.16.5.150"
- "172.16.5.151"
- "172.16.5.152"
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
type: CoreDNS
etcd:
external:
endpoints:
- "https://172.16.5.150:2379"
- "https://172.16.5.151:2379"
- "https://172.16.5.152:2379"
caFile: /etc/k8s/pki/ca.pem
certFile: /etc/k8s/pki/etcd.pem
keyFile: /etc/k8s/pki/etcd.key
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: v1.17.0
networking:
dnsDomain: cluster.local
serviceSubnet: 10.10.0.0/16
podSubnet: 192.168.0.0/16
scheduler: {}
I checked the kube-apiserver.yaml generated by kubeadm.
the --service-cluster-ip-range=10.10.0.0/16 settings is contains 10.10.0.10
you can see below:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: kube-apiserver
tier: control-plane
name: kube-apiserver
namespace: kube-system
spec:
containers:
- command:
- kube-apiserver
- --advertise-address=172.16.5.151
- --allow-privileged=true
- --authorization-mode=Node,RBAC
- --client-ca-file=/etc/kubernetes/pki/ca.crt
- --enable-admission-plugins=NodeRestriction
- --enable-bootstrap-token-auth=true
- --etcd-cafile=/etc/k8s/pki/ca.pem
- --etcd-certfile=/etc/k8s/pki/etcd.pem
- --etcd-keyfile=/etc/k8s/pki/etcd.key
- --etcd-servers=https://172.16.5.150:2379,https://172.16.5.151:2379,https://172.16.5.152:2379
- --insecure-port=0
- --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
- --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
- --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
- --requestheader-allowed-names=front-proxy-client
- --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
- --requestheader-extra-headers-prefix=X-Remote-Extra-
- --requestheader-group-headers=X-Remote-Group
- --requestheader-username-headers=X-Remote-User
- --secure-port=6443
- --service-account-key-file=/etc/kubernetes/pki/sa.pub
- --service-cluster-ip-range=10.10.0.0/16
- --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
- --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
image: registry.aliyuncs.com/google_containers/kube-apiserver:v1.17.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 172.16.5.151
path: /healthz
port: 6443
scheme: HTTPS
initialDelaySeconds: 15
timeoutSeconds: 15
name: kube-apiserver
resources:
requests:
cpu: 250m
volumeMounts:
- mountPath: /etc/ssl/certs
name: ca-certs
readOnly: true
- mountPath: /etc/pki
name: etc-pki
readOnly: true
- mountPath: /etc/k8s/pki
name: etcd-certs-0
readOnly: true
- mountPath: /etc/kubernetes/pki
name: k8s-certs
readOnly: true
hostNetwork: true
priorityClassName: system-cluster-critical
volumes:
- hostPath:
path: /etc/ssl/certs
type: DirectoryOrCreate
name: ca-certs
- hostPath:
path: /etc/pki
type: DirectoryOrCreate
name: etc-pki
- hostPath:
path: /etc/k8s/pki
type: DirectoryOrCreate
name: etcd-certs-0
- hostPath:
path: /etc/kubernetes/pki
type: DirectoryOrCreate
name: k8s-certs
status: {}
As you see above. all the service-ip-range has been set to 10.10.0.0/16. It is strange that when i exec "kubectl get svc" I get the kubernetes clusterip is 10.96.0.1
[root#master02 manifests]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 2d3h
which means the default service-ip-range is: 10.96.0.0/16. And what i modifyed does not work. Does anyone know How to Custom the service-ip-range scope. And how to slove my problem?
Posting this answer as community wiki to expand and explain root cause.
When kubeadm is initiated, and we do not specify any flags, $ kubeadm init it will create kubeadm cluster with default values. You can check in Kubernetes docs flags which can be specified during initilization and which are default values.
--service-cidr string Default: "10.96.0.0/12"
Use alternative range of IP address for service VIPs.
That's the reason why default kubernetes service used 10.96.0.1 as ClusterIP.
Here OP also wanted to use own config.
--config string Path to a kubeadm configuration file.
Whole initialization workflow can be found here.
As Kubernetes docs exmplain Kubeadm reset
Performs a best effort revert of changes made by kubeadm init or kubeadm join.
Depends on our configuration sometimes, some configs stay on the cluster.
Issue, that OP encountered was mentioned here - External etcd clean up
kubeadm reset will not delete any etcd data if external etcd is used. This means that if you run kubeadm init again using the same etcd endpoints, you will see state from previous clusters.
Regarding Immutable fields: Service “kube-dns” is invalid: spec.clusterIP: Invalid value: “10.10.0.10”: field is immutable.
In Kubernetes, some fields are secured to prevent changes that might disrupt working of the cluster.
If any field is immutable but we have to change it, this object must be removed and add again.
Because this node I joined the cluster as a node beforeBecause this node I joined the cluster as a node before.Later I reset this with "kubeadm reset " command.After the reset, I joined it as a master role to the cluster. So I get the error in my question above.
The error is because the range of the clusterip before I reset is already recorded in the etcd cluster. And "kubeadm reset" command does not clean up the data in the etcd.So the new definition of clusterip conflicts with the original.So the solution is to clean up the data in the etcd and reset it again. (Since the cluster I built is a test cluster, I cleaned the etcd directly. Please be careful in the production environment)

NFS volume mount times out on Kubernetes with incorrect IP?

97 Mounting command: systemd-run 98 Mounting arguments:
--description=Kubernetes transient mount for /var/lib/kubelet/pods/06b9ae42-8e99-11e9-b888-a44c24184b19/volumes/kubernetes.io~nfs/nfs-data
--scope -- mount -t nfs 10.100.155.82:/exports/www /var/lib/kubelet/pods/06b9ae42-8e99-11e9-b888-a44c24184b19/volumes/kubernetes.io~nfs/nfs-data
99 Output: Running scope as unit:
run-r26f9da6c287846589bec8d059c33441d.scope 100 mount.nfs: Connection
timed out
101 FailedMount Warning 2019-06-14T11:48:46Z
102 typo3-app-67b58d7657-cvqdg Pod Unable to mount volumes for pod
"typo3-app-67b58d7657-cvqdg_default(1fb4c719-8e9a-11e9-b888-a44c24184b19)":
timeout expired waiting for volumes to attach or mount for pod
"default"/"typo3-app-67b58d7657-cvqdg". list of unmounted
volumes=[nfs-data nfs-data-src]. list of unattached volumes=[nfs-data
nfs-data-src
default-token-lmtl4] FailedMount Warning 2019-06-14T11:49:04Z
Weirdly I have no idea where it's getting 10.100.155.82 from? This was the the previous IP of the ClusterIP (relating to the nfs service)...
apiVersion: apps/v1
kind: Deployment
metadata:
name: typo3-app
labels:
app: typo3
spec:
replicas: 1
selector:
matchLabels:
app: typo3
template:
metadata:
labels:
app: typo3
spec:
containers:
- name: app
image: us.gcr.io/objit-chris/chrisjitit-typo3:v11
ports:
- containerPort: 80
volumeMounts:
- mountPath: /var/www/html-chrisjitit
name: nfs-data
- mountPath: /var/www/typo3_src-6.2.6
name: nfs-data-src
volumes:
- name: nfs-data
nfs:
# https://github.com/kubernetes/minikube/issues/3417
# server is not resolved using kube dns (so can't resolve to a service name - hence we need the IP)
#server: 10.11.250.37
server: 10.97.78.206
path: /exports/www
- name: nfs-data-src
nfs:
# https://github.com/kubernetes/minikube/issues/3417
# server is not resolved using kube dns (so can't resolve to a service name - hence we need the IP)
#server: 10.11.250.37
server: 10.97.78.206
path: /exports/www/typo3_src
What may be the cause of this timeout / wrong IP being used?
I tried deleting the deployment, changing the name that still didn't seem to work, but few minutes later it worked... Really strange behavior?
Ran into this issue again...:
kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
nfs-data 10Gi RWO Retain Available 67m
pvc-f1353542-a8b1-11e9-bdf7-38ffa66115bc 10Gi RWO Delete Bound default/nfs-data standard 67m
kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
nfs-data Bound pvc-f1353542-a8b1-11e9-bdf7-38ffa66115bc 10Gi RWO standard 67m
It keeps picking up an old NFS IP. It's a bug...:
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/9470ac17-a8b9-11e9-bdf7-38ffa66115bc/volumes/kubernetes.io~nfs/nfs-data-src --scope -- mount -t nfs 10.11.250.37:/exports/www/typo3_src /var/lib/kubelet/pods/9470ac17-a8b9-11e9-bdf7-38ffa66115bc/volumes/kubernetes.io~nfs/nfs-data-src
Output: Running scope as unit: run-ref2095cb52c94d0c87de5458c3b16733.scope
mount.nfs: Connection timed out

Enabling ExpandPersistentVolumes

I need to resize a bunch of PVCs. It seems the easiest way to do it is through
the ExpandPersistentVolumes feature. I am however having trouble getting the
configuration to cooperate.
The ExpandPersistentVolumes feature gate is set in kubelet on all three
masters, as shown:
(output trimmed to relevant bits for sanity)
$ parallel-ssh -h /tmp/masters -P "ps aux | grep feature"
172.20.53.249: root 15206 7.4 0.5 619888 83952 ? Ssl 19:52 0:02 /opt/kubernetes/bin/kubelet --feature-gates=ExpandPersistentVolumes=true,ExperimentalCriticalPodAnnotation=true
[1] 12:53:08 [SUCCESS] 172.20...
172.20.58.111: root 17798 4.5 0.5 636280 87328 ? Ssl 19:51 0:04 /opt/kubernetes/bin/kubelet --feature-gates=ExpandPersistentVolumes=true,ExperimentalCriticalPodAnnotation=true
[2] 12:53:08 [SUCCESS] 172.20...
172.20.53.240: root 9287 4.0 0.5 645276 90528 ? Ssl 19:50 0:06 /opt/kubernetes/bin/kubelet --feature-gates=ExpandPersistentVolumes=true,ExperimentalCriticalPodAnnotation=true
[3] 12:53:08 [SUCCESS] 172.20..
The apiserver has the PersistentVolumeClaimResize admission controller, as shown:
$ kubectl --namespace=kube-system get pod -o yaml | grep -i admission
/usr/local/bin/kube-apiserver --admission-control=Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,NodeRestriction,PersistentVolumeClaimResize,ResourceQuota
/usr/local/bin/kube-apiserver --admission-control=Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,NodeRestriction,PersistentVolumeClaimResize,ResourceQuota
/usr/local/bin/kube-apiserver --admission-control=Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,NodeRestriction,PersistentVolumeClaimResize,ResourceQuota
However, when I create or edit a storage class to add allowVolumeExpansion,
it is removed on save. For example:
$ cat new-sc.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
creationTimestamp: null
labels:
k8s-addon: storage-aws.addons.k8s.io
name: gp2-2
selfLink: /apis/storage.k8s.io/v1/storageclasses/gp2
parameters:
encrypted: "true"
kmsKeyId: arn:aws:kms:us-west-2:<omitted>
type: gp2
zone: us-west-2a
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Delete
allowVolumeExpansion: true
$ kubectl create -f new-sc.yaml
storageclass "gp2-2" created
$ kubectl get sc gp2-2 -o yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
creationTimestamp: 2018-05-22T20:00:17Z
labels:
k8s-addon: storage-aws.addons.k8s.io
name: gp2-2
resourceVersion: "2546166"
selfLink: /apis/storage.k8s.io/v1/storageclasses/gp2-2
uid: <omitted>
parameters:
encrypted: "true"
kmsKeyId: arn:aws:kms:us-west-2:<omitted>
type: gp2
zone: us-west-2a
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Delete
What am I missing? What is erasing this key from my storageclass configuration?
EDIT: Here is the command used by the kube-apiserver pods. It does not say anything about feature gates. The cluster was launched using Kops.
- /bin/sh
- -c
- mkfifo /tmp/pipe; (tee -a /var/log/kube-apiserver.log < /tmp/pipe & ) ; exec
/usr/local/bin/kube-apiserver --address=127.0.0.1 --admission-control=Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,NodeRestriction,PersistentVolumeClaimResize,ResourceQuota
--allow-privileged=true --anonymous-auth=false --apiserver-count=3 --authorization-mode=RBAC
--basic-auth-file=/srv/kubernetes/basic_auth.csv --client-ca-file=/srv/kubernetes/ca.crt
--cloud-provider=aws --etcd-cafile=/srv/kubernetes/ca.crt --etcd-certfile=/srv/kubernetes/etcd-client.pem
--etcd-keyfile=/srv/kubernetes/etcd-client-key.pem --etcd-servers-overrides=/events#https://127.0.0.1:4002
--etcd-servers=https://127.0.0.1:4001 --insecure-port=8080 --kubelet-preferred-address-types=InternalIP,Hostname,ExternalIP
--proxy-client-cert-file=/srv/kubernetes/apiserver-aggregator.cert --proxy-client-key-file=/srv/kubernetes/apiserver-aggregator.key
--requestheader-allowed-names=aggregator --requestheader-client-ca-file=/srv/kubernetes/apiserver-aggregator-ca.cert
--requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group
--requestheader-username-headers=X-Remote-User --secure-port=443 --service-cluster-ip-range=100.64.0.0/13
--storage-backend=etcd3 --tls-cert-file=/srv/kubernetes/server.cert --tls-private-key-file=/srv/kubernetes/server.key
--token-auth-file=/srv/kubernetes/known_tokens.csv --v=1 > /tmp/pipe 2>&1
It could happen if you did not enable alpha feature-gate for the option.
Did you set --feature-gates option for kube-apiserver?
--feature-gates mapStringBool - A set of key=value pairs that describe feature gates for alpha/experimental features. Options are:
...
ExpandPersistentVolumes=true|false (ALPHA - default=false)
...
Update: If you don't see this option in the command line arguments, you need to add it (--feature-gates=ExpandPersistentVolumes=true).
In case you run kube-apiserver as a pod, you should edit /etc/kubernetes/manifests/kube-apiserver.yaml and add the feature-gate option to other arguments. kube-apiserver will restart automatically.
In case you run kube-apiserver as a process maintained by systemd, you should edit kube-apiserver.service or service options $KUBE_API_ARGS in a separate file, and append feature-gate option there. Restart the service with systemctl restart kube-apiserver.service command.
After enabling it, you can create a StorageClass object with allowVolumeExpansion option:
# kubectl get sc -o yaml --export
apiVersion: v1
items:
- allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
creationTimestamp: 2018-05-23T14:38:43Z
labels:
k8s-addon: storage-aws.addons.k8s.io
name: gp2-2
namespace: ""
resourceVersion: "1385"
selfLink: /apis/storage.k8s.io/v1/storageclasses/gp2-2
uid: fe516dcf-5e96-11e8-a86d-42010a9a0002
parameters:
encrypted: "true"
kmsKeyId: arn:aws:kms:us-west-2:<omitted>
type: gp2
zone: us-west-2a
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Delete
volumeBindingMode: Immediate
kind: List
metadata:
resourceVersion: ""
selfLink: ""