Enabling ExpandPersistentVolumes - kubernetes

I need to resize a bunch of PVCs. It seems the easiest way to do it is through
the ExpandPersistentVolumes feature. I am however having trouble getting the
configuration to cooperate.
The ExpandPersistentVolumes feature gate is set in kubelet on all three
masters, as shown:
(output trimmed to relevant bits for sanity)
$ parallel-ssh -h /tmp/masters -P "ps aux | grep feature"
172.20.53.249: root 15206 7.4 0.5 619888 83952 ? Ssl 19:52 0:02 /opt/kubernetes/bin/kubelet --feature-gates=ExpandPersistentVolumes=true,ExperimentalCriticalPodAnnotation=true
[1] 12:53:08 [SUCCESS] 172.20...
172.20.58.111: root 17798 4.5 0.5 636280 87328 ? Ssl 19:51 0:04 /opt/kubernetes/bin/kubelet --feature-gates=ExpandPersistentVolumes=true,ExperimentalCriticalPodAnnotation=true
[2] 12:53:08 [SUCCESS] 172.20...
172.20.53.240: root 9287 4.0 0.5 645276 90528 ? Ssl 19:50 0:06 /opt/kubernetes/bin/kubelet --feature-gates=ExpandPersistentVolumes=true,ExperimentalCriticalPodAnnotation=true
[3] 12:53:08 [SUCCESS] 172.20..
The apiserver has the PersistentVolumeClaimResize admission controller, as shown:
$ kubectl --namespace=kube-system get pod -o yaml | grep -i admission
/usr/local/bin/kube-apiserver --admission-control=Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,NodeRestriction,PersistentVolumeClaimResize,ResourceQuota
/usr/local/bin/kube-apiserver --admission-control=Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,NodeRestriction,PersistentVolumeClaimResize,ResourceQuota
/usr/local/bin/kube-apiserver --admission-control=Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,NodeRestriction,PersistentVolumeClaimResize,ResourceQuota
However, when I create or edit a storage class to add allowVolumeExpansion,
it is removed on save. For example:
$ cat new-sc.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
creationTimestamp: null
labels:
k8s-addon: storage-aws.addons.k8s.io
name: gp2-2
selfLink: /apis/storage.k8s.io/v1/storageclasses/gp2
parameters:
encrypted: "true"
kmsKeyId: arn:aws:kms:us-west-2:<omitted>
type: gp2
zone: us-west-2a
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Delete
allowVolumeExpansion: true
$ kubectl create -f new-sc.yaml
storageclass "gp2-2" created
$ kubectl get sc gp2-2 -o yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
creationTimestamp: 2018-05-22T20:00:17Z
labels:
k8s-addon: storage-aws.addons.k8s.io
name: gp2-2
resourceVersion: "2546166"
selfLink: /apis/storage.k8s.io/v1/storageclasses/gp2-2
uid: <omitted>
parameters:
encrypted: "true"
kmsKeyId: arn:aws:kms:us-west-2:<omitted>
type: gp2
zone: us-west-2a
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Delete
What am I missing? What is erasing this key from my storageclass configuration?
EDIT: Here is the command used by the kube-apiserver pods. It does not say anything about feature gates. The cluster was launched using Kops.
- /bin/sh
- -c
- mkfifo /tmp/pipe; (tee -a /var/log/kube-apiserver.log < /tmp/pipe & ) ; exec
/usr/local/bin/kube-apiserver --address=127.0.0.1 --admission-control=Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,NodeRestriction,PersistentVolumeClaimResize,ResourceQuota
--allow-privileged=true --anonymous-auth=false --apiserver-count=3 --authorization-mode=RBAC
--basic-auth-file=/srv/kubernetes/basic_auth.csv --client-ca-file=/srv/kubernetes/ca.crt
--cloud-provider=aws --etcd-cafile=/srv/kubernetes/ca.crt --etcd-certfile=/srv/kubernetes/etcd-client.pem
--etcd-keyfile=/srv/kubernetes/etcd-client-key.pem --etcd-servers-overrides=/events#https://127.0.0.1:4002
--etcd-servers=https://127.0.0.1:4001 --insecure-port=8080 --kubelet-preferred-address-types=InternalIP,Hostname,ExternalIP
--proxy-client-cert-file=/srv/kubernetes/apiserver-aggregator.cert --proxy-client-key-file=/srv/kubernetes/apiserver-aggregator.key
--requestheader-allowed-names=aggregator --requestheader-client-ca-file=/srv/kubernetes/apiserver-aggregator-ca.cert
--requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group
--requestheader-username-headers=X-Remote-User --secure-port=443 --service-cluster-ip-range=100.64.0.0/13
--storage-backend=etcd3 --tls-cert-file=/srv/kubernetes/server.cert --tls-private-key-file=/srv/kubernetes/server.key
--token-auth-file=/srv/kubernetes/known_tokens.csv --v=1 > /tmp/pipe 2>&1

It could happen if you did not enable alpha feature-gate for the option.
Did you set --feature-gates option for kube-apiserver?
--feature-gates mapStringBool - A set of key=value pairs that describe feature gates for alpha/experimental features. Options are:
...
ExpandPersistentVolumes=true|false (ALPHA - default=false)
...
Update: If you don't see this option in the command line arguments, you need to add it (--feature-gates=ExpandPersistentVolumes=true).
In case you run kube-apiserver as a pod, you should edit /etc/kubernetes/manifests/kube-apiserver.yaml and add the feature-gate option to other arguments. kube-apiserver will restart automatically.
In case you run kube-apiserver as a process maintained by systemd, you should edit kube-apiserver.service or service options $KUBE_API_ARGS in a separate file, and append feature-gate option there. Restart the service with systemctl restart kube-apiserver.service command.
After enabling it, you can create a StorageClass object with allowVolumeExpansion option:
# kubectl get sc -o yaml --export
apiVersion: v1
items:
- allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
creationTimestamp: 2018-05-23T14:38:43Z
labels:
k8s-addon: storage-aws.addons.k8s.io
name: gp2-2
namespace: ""
resourceVersion: "1385"
selfLink: /apis/storage.k8s.io/v1/storageclasses/gp2-2
uid: fe516dcf-5e96-11e8-a86d-42010a9a0002
parameters:
encrypted: "true"
kmsKeyId: arn:aws:kms:us-west-2:<omitted>
type: gp2
zone: us-west-2a
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Delete
volumeBindingMode: Immediate
kind: List
metadata:
resourceVersion: ""
selfLink: ""

Related

Velero + MinIO: Unknown desc = AuthorizationHeaderMalformed: The authorization header is malformed; the region 'us-east-1' is wrong;

I'm getting this issue below. Anyone has an idea what could be wrong?
user#master-1:~$ kubectl logs -n velero velero-77b544f457-dw4hf
# REMOVED
An error occurred: some backup storage locations are invalid: backup store for location "aws" is invalid: rpc error: code = Unknown desc = AuthorizationHeaderMalformed: The authorization header is malformed; the region 'us-east-1' is wrong; expecting 'us-west-2'
status code: 400, request id: A3Q97JKM6GQRNABA, host id: b6g0on189w6hYgCrId/Xr0BP44pXjZPy2SqK2t7bn/+Ggq9FUY2N3KQHYRcMEuCCHY2L2vfsYEo=; backup store for location "velero" is invalid: rpc error: code = Unknown desc = AuthorizationHeaderMalformed: The authorization header is malformed; the region 'us-east-1' is wrong; expecting 'us-west-2'
status code: 400, request id: YF6DRKN7MYSXVBV4, host id: Y8/Gufd7R0BlZCIZqbJPfdAjVqK8+WLfWoANBDnipDkH421/vGt0Ne2E/yZw2bYf7rfms+rGxsg=
user#master-1:~$
I have installed Velero 1.4.2 with Helm chart:
user#master-1:~$ helm search repo velero --versions | grep -e 2.12.17 -e NAME
NAME CHART VERSION APP VERSION DESCRIPTION
vmware-tanzu/velero 2.12.17 1.4.2 A Helm chart for velero
user#master-1:~$
I used this command to install:
helm install velero vmware-tanzu/velero --namespace velero --version 2.12.17 -f velero-values.yaml \
--set-file credentials.secretContents.cloud=/home/era/creds-root.txt \
--set configuration.provider=aws \
--set configuration.backupStorageLocation.name=velero \
--set configuration.backupStorageLocation.bucket="velero" \
--set configuration.backupStorageLocation.prefix="" \
--set configuration.backupStorageLocation.config.region="us-east-1" \
--set image.repository=velero/velero \
--set image.tag=v1.4.2 \
--set image.pullPolicy=IfNotPresent \
--set initContainers[0].name=velero-plugin-for-aws \
--set initContainers[0].image=velero/velero-plugin-for-aws:v1.1.0 \
--set initContainers[0].volumeMounts[0].mountPath=/target \
--set initContainers[0].volumeMounts[0].name=plugins \
--replace
My credential files passed:
$ cat creds-root.txt
[default]
aws_access_key_id=12345678
aws_secret_access_key=12345678
Helm values file:
user#master-1:~$ cat velero-values.yaml
configuration:
provider: aws
backupStorageLocation:
name: minio
provider: aws
# caCert: null
bucket: velero
config:
region: us-east-1
credentials:
useSecret: true
existingSecret: cloud-credentials
secretContents: {}
extraEnvVars: {}
backupsEnabled: true
snapshotsEnabled: true
deployRestic: true
MinIO snapshot resource (MinIO is working at 192.168.2.239:9000):
# For MinIO
---
apiVersion: velero.io/v1
kind: VolumeSnapshotLocation
metadata:
name: minio
namespace: velero
spec:
provider: openebs.io/cstor-blockstore
config:
bucket: velero
prefix: cstor
provider: aws
# The region where the server is located.
region: us-east-1
# profile for credential, if not mentioned then plugin will use profile=default
profile: user1
# Whether to use path-style addressing instead of virtual hosted bucket addressing.
# Set to "true"
s3ForcePathStyle: "true"
# S3 URL, By default it will be generated from "region" and "bucket"
s3Url: http://192.168.2.239:9000
# You can specify the multipart_chunksize here for explicitness.
# multiPartChunkSize can be from 5Mi(5*1024*1024 Bytes) to 5Gi
# For more information: https://docs.min.io/docs/minio-server-limits-per-tenant.html
# If not set then it will be calculated from the file size
multiPartChunkSize: 64Mi
# If MinIO is configured with custom certificate then certificate can be passed to plugin through caCert
# Value of caCert must be base64 encoded
# To encode, execute command: cat ca.crt |base64 -w 0
# caCert: LS0tLS1CRU...tRU5EIENFUlRJRklDQVRFLS0tLS0K
# If you want to disable certificate verification then set insecureSkipTLSVerify to "true"
# By default insecureSkipTLSVerify is set to "false"
insecureSkipTLSVerify: "true"
aws resource which seems failing:
$ k get backupstoragelocation -n velero aws -o yaml
apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
annotations:
helm.sh/hook: post-install,post-upgrade
helm.sh/hook-delete-policy: before-hook-creation
creationTimestamp: "2021-04-15T08:23:38Z"
generation: 3
labels:
app.kubernetes.io/instance: velero
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: velero
helm.sh/chart: velero-2.12.17
managedFields:
- apiVersion: velero.io/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:helm.sh/hook: {}
f:helm.sh/hook-delete-policy: {}
f:labels:
.: {}
f:app.kubernetes.io/instance: {}
f:app.kubernetes.io/managed-by: {}
f:app.kubernetes.io/name: {}
f:helm.sh/chart: {}
f:spec:
.: {}
f:config:
.: {}
f:region: {}
f:objectStorage:
.: {}
f:prefix: {}
f:provider: {}
manager: Go-http-client
operation: Update
time: "2021-04-15T08:23:38Z"
- apiVersion: velero.io/v1
fieldsType: FieldsV1
fieldsV1:
f:spec:
f:objectStorage:
f:bucket: {}
manager: kubectl-edit
operation: Update
time: "2021-04-15T17:52:46Z"
name: aws
namespace: velero
resourceVersion: "1333724"
selfLink: /apis/velero.io/v1/namespaces/velero/backupstoragelocations/aws
uid: a51033b2-e53d-4751-9110-c9649de6aa67
spec:
config:
region: us-east-1
objectStorage:
bucket: velero
prefix: backup
provider: aws
user#master-1:~$
For some reason no plugins are listed:
user#master-1:~$ velero plugin get
user#master-1:~$
Velero is obviously crashing because of original issue:
user#master-1:~$ kubectl get pods -n velero
NAME READY STATUS RESTARTS AGE
restic-nqpsl 1/1 Running 0 7m52s
restic-pw897 1/1 Running 0 7m52s
restic-rtwzd 1/1 Running 0 7m52s
velero-77b544f457-dw4hf 0/1 CrashLoopBackOff 5 5m59s
user#master-1:~$
More resources:
user#master-1:~$ k get BackupStorageLocation -n velero
NAME PHASE LAST VALIDATED AGE
aws 10h
velero 11m
user#master-1:~$ k get volumesnapshotlocation -n velero
NAME AGE
default 11m
minio 39h
velero-snapshot 9h
user#master-1:~$
My MinIO service is started using Docker Compose and working fine:
version: '3.8'
services:
minio:
container_name: minio
hostname: minio
build:
context: .
dockerfile: Dockerfile
restart: always
ports:
- "0.0.0.0:9000:9000"
environment:
# ROOT
MINIO_ACCESS_KEY: 12345678
MINIO_SECRET_KEY: 12345678
MINIO_REGION: us-east-1
command: server --address :9000 /data
volumes:
- ./data:/data
Unknown PHASE for backup locations:
user#master-1:~$ velero get backup-locations
NAME PROVIDER BUCKET/PREFIX PHASE LAST VALIDATED ACCESS MODE
aws aws velero/backup Unknown Unknown ReadWrite
velero aws velero Unknown Unknown ReadWrite
user#master-1:~$
Test MinIO access separately:
bash-4.3# AWS_ACCESS_KEY_ID=12345678 AWS_SECRET_ACCESS_KEY=12345678 aws s3api get-bucket-location --endpoint-url http://192.168.2.239:9000 --bucket velero
{
"LocationConstraint": "us-east-1"
}
bash-4.3#
Secrets are correct:
user#master-1:~$ k get secret -n velero cloud-credentials -o yaml | head -n 4
apiVersion: v1
data:
cloud: W2RlZmF-REMOVED
kind: Secret
user#master-1:~$
user#master-1:~$ k get secret -n velero
NAME TYPE DATA AGE
cloud-credentials Opaque 1 91m
default-token-8rwhg kubernetes.io/service-account-token 3 2d20h
sh.helm.release.v1.velero.v1 helm.sh/release.v1 1 45m
velero Opaque 0 2d19h
velero-restic-credentials Opaque 1 40h
velero-server-token-8zm9k kubernetes.io/service-account-token 3 45m
user#master-1:~$
The problem was missing configuration:
--set configuration.backupStorageLocation.config.s3Url="http://192.168.2.239:9000" \
--set configuration.backupStorageLocation.config.s3ForcePathStyle=true \

failed to register a Autopilot GKE cluster to Anthos

I am trying to add an existing GKE cluster (auto-pilot one) to Anthos within the same project. It updated the hub memberships, however, the gke-connect agent pod is failing with a RBAC-related error.
$ for ns in $(kubectl get ns -o jsonpath={.items..metadata.name} -l hub.gke.io/project); do
> echo "======= Logs $ns ======="
> kubectl logs -n $ns -l app=gke-connect-agent
> done
======= Logs gke-connect =======
2021/03/26 15:57:50.604149 gkeconnect_agent.go:39: GKE Connect Agent. Log timestamps in UTC.
2021/03/26 15:57:50.604380 gkeconnect_agent.go:40:
Built on: 2021-03-19 09:40:57 +0000 UTC
Built at: 363842994
Build Status: mint
Build Label: 20210319-01-00
2021/03/26 15:57:50.715289 gkeconnect_agent.go:50: error creating kubernetes
connect agent: unable to retrieve namespace "kube-system" to be used as
connectionID: namespaces "kube-system" is forbidden: User
"system:serviceaccount:gke-connect:connect-agent-sa" cannot get resource
"namespaces" in API group "" in the namespace "kube-system"
I checked the rolebindings for connect-agent-sa service account the role seems to have necessary permissions to get namespaces but it's failing.
$ k get role gke-connect-agent-20210319-01-00 -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
creationTimestamp: "2021-03-26T16:35:12Z"
labels:
hub.gke.io/project: xxxxxxxxxxxxxxxxxxx
version: 20210319-01-00
managedFields:
- apiVersion: rbac.authorization.k8s.io/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:labels:
.: {}
f:hub.gke.io/project: {}
f:version: {}
f:rules: {}
manager: GoogleCloudConsole
operation: Update
time: "2021-03-26T16:35:12Z"
name: gke-connect-agent-20210319-01-00
namespace: gke-connect
resourceVersion: "10595136"
selfLink: /apis/rbac.authorization.k8s.io/v1/namespaces/gke-connect/roles/gke-connect-agent-20210319-01-00
uid: xxxxxxxx
rules:
- apiGroups:
- ""
resources:
- secrets
- namespaces <-- namespaces!!!
- configmaps
verbs:
- get <-- get!!!
- watch
- list
- apiGroups:
- ""
resources:
- events
verbs:
- create
Are there any other restrictions and policies that I am unaware of? Is that because of the auto-pilot cluster?

Kubectl delete tls when no namspace

There were a namespace "sandbox" on the node which was deleted, but there is still a challenge for a certificate "echo-tls".
But i can not access anymore sandbox namespace to delete this cert.
Could anyone help me deleting this resource ?
Here are the logs of the cert-manager :
Found status change for Certificate "echo-tls" condition "Ready": "True" -> "False"; setting lastTransitionTime to...
cert-manager/controller/CertificateReadiness "msg"="re-queuing item due to error processing" "error"="Operation cannot be fulfilled on certificates.cert-manager.io \"echo-tls\": StorageError: invalid object, Code: 4, Key: /cert-manager.io/certificates/sandbox/echo-tls, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: ..., UID in object meta: " "key"="sandbox/echo-tls"
After restarting the pod cert-manager here are the logs :
cert-manager/controller/certificaterequests/handleOwnedResource "msg"="error getting referenced owning resource" "error"="certificaterequest.cert-manager.io \"echo-tls-bkmm8\" not found" "related_resource_kind"="CertificateRequest" "related_resource_name"="echo-tls-bkmm8" "related_resource_namespace"="sandbox" "resource_kind"="Order" "resource_name"="echo-tls-bkmm8-1177139468" "resource_namespace"="sandbox" "resource_version"="v1"
cert-manager/controller/orders "msg"="re-queuing item due to error processing" "error"="ACME client for issuer not initialised/available" "key"="sandbox/echo-tls-dwpt4-1177139468"
And then the same logs as before
The issuer :
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
email: ***
server: https://acme-v02.api.letsencrypt.org/directory
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress: {}
The configs for deployment :
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: <APP_NAME>
annotations:
kubernetes.io/tls-acme: "true"
kubernetes.io/ingress.class: nginx-<ENV>
acme.cert-manager.io/http01-ingress-class: nginx-<ENV>
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
tls:
- hosts:
- ***.fr
secretName: <APP_NAME>-tls
rules:
- host: ***.fr
http:
paths:
- backend:
serviceName: <APP_NAME>
servicePort: 80
.k8s_config: &k8s_config
before_script:
- export HOME=/tmp
- export K8S_NAMESPACE="${APP_NAME}"
- kubectl config set-cluster k8s --server="${K8S_SERVER}"
- kubectl config set clusters.k8s.certificate-authority-data ${K8S_CA_DATA}
- kubectl config set-credentials default --token="${K8S_USER_TOKEN}"
- kubectl config set-context default --cluster=k8s --user=default --namespace=default
- kubectl config set-context ${K8S_NAMESPACE} --cluster=k8s --user=default --namespace=${K8S_NAMESPACE}
- kubectl config use-context default
- if [ -z `kubectl get namespace ${K8S_NAMESPACE} --no-headers --output=go-template={{.metadata.name}} 2>/dev/null` ]; then kubectl create namespace ${K8S_NAMESPACE}; fi
- if [ -z `kubectl --namespace=${K8S_NAMESPACE} get secret *** --no-headers --output=go-template={{.metadata.name}} 2>/dev/null` ]; then kubectl get secret *** --output yaml | sed "s/namespace:\ default/namespace:\ ${K8S_NAMESPACE}/" | kubectl create -f - ; fi
- kubectl config use-context ${K8S_NAMESPACE}
Usually certificates are stored inside Kubernete secrets: https://kubernetes.io/docs/concepts/configuration/secret/#tls-secrets. You can retrieve secrets using kubectl get secrets --all-namespaces. You can also check which secrets are used by a given pod by checking its yaml description: kubectl get pods -n <pod-namespace> -o yaml (additional informations: https://kubernetes.io/docs/concepts/configuration/secret/#using-secrets-as-files-from-a-pod)
A namespace is cluster-wide, it is not located on any node. So deleting a node does not delete any namespace.
If above tracks does not fit your need, could you please provide some yaml files and some command-line instructions which would allow reproducing the problem?
Finally this sunday the cert-manger has stop challenges on the old tls without any other action.

unknown field "configMap" in io.k8s.api.core.v1.NodeConfigSource

I am trying to add kublet parameter for garbage-collection to be taken care automatically. I followed below steps and while editing the node getting error as "unknown field "configMap" in io.k8s.api.core.v1.NodeConfigSource"
step 1:
bash kubectl proxy --port=8001 &
Step 2:
Pulled the current config file
NODE_NAME="the-name-of-the-node-you-are-reconfiguring"; curl -sSL "http://localhost:8001/api/v1/nodes/${NODE_NAME}/proxy/configz" | jq '.kubeletconfig|.kind="KubeletConfiguration"|.apiVersion="kubelet.config.k8s.io/v1beta1"' > kubelet_configz_${NODE_NAME}
Step 3:
I edited this values
"imageGCHighThresholdPercent": 70,
"imageGCLowThresholdPercent": 65,
Step 4:
Pushed the configuration to control plane
kubectl -n kube-system create configmap my-node-config --from-file=kubelet=kubelet_configz_${NODE_NAME} --append-hash -o yaml
Step 5:
Edit node
kubectl edit node ${NODE_NAME}
Added configsource in it
configSource:
configMap:
name: CONFIG_MAP_NAME #my new created configmap name added
namespace: kube-system
kubeletConfigKey: kubelet
While saving the edit node geting error as "unknown field "configMap" in io.k8s.api.core.v1.NodeConfigSource"
My node info
nodeInfo:
architecture: amd64
bootID: 951c736d-9a2c-4a81-bf32-922c53970ab3
containerRuntimeVersion: docker://17.3.2
kernelVersion: 3.10.0-693.11.6.el7.x86_64
kubeProxyVersion: v1.10.6
kubeletVersion: v1.10.6
machineID: 609bbd29e32a4898e604f49bff82a88c
operatingSystem: linux
osImage: CentOS Linux 7 (Core)
systemUUID: EC20197C-6279-B13C-6A3A-000FDAC5C4E8
apiVersion: v1
items:
- apiVersion: v1
kind: Node
metadata:
annotations:
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
Spec Info:
spec:
externalID: i-0f84faccd78dff3b3
podCIDR: 109.92.5.0/24
providerID: aws:///ap-south-1a/i-0foh4faccdsdcns3b3
Ref link: https://kubernetes.io/docs/tasks/administer-cluster/reconfigure-kubelet/
configSource:
configMap:
name: CONFIG_MAP_NAME
namespace: kube-system
kubeletConfigKey: kubelet
try this once

Kubernetes: Failed to get GCE GCECloudProvider with error <nil>

I have set up a custom kubernetes cluster on GCE using kubeadm. I am trying to use StatefulSets with persistent storage.
I have the following configuration:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: gce-slow
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-standard
zones: europe-west3-b
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: myname
labels:
app: myapp
spec:
serviceName: myservice
replicas: 1
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: mycontainer
image: ubuntu:16.04
env:
volumeMounts:
- name: myapp-data
mountPath: /srv/data
imagePullSecrets:
- name: sitesearch-secret
volumeClaimTemplates:
- metadata:
name: myapp-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: gce-slow
resources:
requests:
storage: 1Gi
And I get the following error:
Nopx#vm0:~$ kubectl describe pvc
Name: myapp-data-myname-0
Namespace: default
StorageClass: gce-slow
Status: Pending
Volume:
Labels: app=myapp
Annotations: volume.beta.kubernetes.io/storage-provisioner=kubernetes.io/gce-pd
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ProvisioningFailed 5s persistentvolume-controller Failed to provision volume
with StorageClass "gce-slow": Failed to get GCE GCECloudProvider with error <nil>
I am treading in the dark and do not know what is missing. It seems logical that it doesn't work, since the provisioner never authenticates to GCE. Any hints and pointers are very much appreciated.
EDIT
I Tried the solution here, by editing the config file in kubeadm with kubeadm config upload from-file, however the error persists. The kubadm config looks like this right now:
api:
advertiseAddress: 10.156.0.2
bindPort: 6443
controlPlaneEndpoint: ""
auditPolicy:
logDir: /var/log/kubernetes/audit
logMaxAge: 2
path: ""
authorizationModes:
- Node
- RBAC
certificatesDir: /etc/kubernetes/pki
cloudProvider: gce
criSocket: /var/run/dockershim.sock
etcd:
caFile: ""
certFile: ""
dataDir: /var/lib/etcd
endpoints: null
image: ""
keyFile: ""
imageRepository: k8s.gcr.io
kubeProxy:
config:
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 10
contentType: application/vnd.kubernetes.protobuf
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 5
clusterCIDR: 192.168.0.0/16
configSyncPeriod: 15m0s
conntrack:
max: null
maxPerCore: 32768
min: 131072
tcpCloseWaitTimeout: 1h0m0s
tcpEstablishedTimeout: 24h0m0s
enableProfiling: false
healthzBindAddress: 0.0.0.0:10256
hostnameOverride: ""
iptables:
masqueradeAll: false
masqueradeBit: 14
minSyncPeriod: 0s
syncPeriod: 30s
ipvs:
minSyncPeriod: 0s
scheduler: ""
syncPeriod: 30s
metricsBindAddress: 127.0.0.1:10249
mode: ""
nodePortAddresses: null
oomScoreAdj: -999
portRange: ""
resourceContainer: /kube-proxy
udpIdleTimeout: 250ms
kubeletConfiguration: {}
kubernetesVersion: v1.10.2
networking:
dnsDomain: cluster.local
podSubnet: 192.168.0.0/16
serviceSubnet: 10.96.0.0/12
nodeName: mynode
privilegedPods: false
token: ""
tokenGroups:
- system:bootstrappers:kubeadm:default-node-token
tokenTTL: 24h0m0s
tokenUsages:
- signing
- authentication
unifiedControlPlaneImage: ""
Edit
The issue was resolved in the comments thanks to Anton Kostenko. The last edit coupled with kubeadm upgrade solves the problem.
The answer took me a while but here it is:
Using the GCECloudProvider in Kubernetes outside of the Google Kubernetes Engine has the following prerequisites (the last point is Kubeadm specific):
The VM needs to be run with a service account that has the right to provision disks. Info on how to run a VM with a service account can be found here
The Kubelet needs to run with the argument --cloud-provider=gce. For this the KUBELET_KUBECONFIG_ARGS in /etc/systemd/system/kubelet.service.d/10-kubeadm.conf have to be edited. The Kubelet can then be restarted with
sudo systemctl restart kubelet
The Kubernetes cloud-config file needs to be configured. The file can be found at /etc/kubernetes/cloud-config and the following content is enough to get the cloud provider to work:
[Global]
project-id = "<google-project-id>"
Kubeadm needs to have GCE configured as its cloud provider. The config posted in the question works fine for this. However, the nodeName has to be changed.
Create dynamic persistent volumes in Kubernetes nodes in the Google cloud virtual machine.
GCP role:
google cloud console go to IAM & Admin.
Add a new service account e.g gce-user.
Add role "compute instance admin".
Add the role to GCP VM:
stop the instance and click edit.
click service account and select new account e.g gce-user.
start the virtual machine.
Add GCE parameter in kubelet in all nodes.
add "--cloud-provider=gce"
sudo vi /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
add the value:
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --cloud-provider=gce"
create new file /etc/kubernetes/cloud-config in all nodes
add this param.
[Global]
project-id = "xxxxxxxxxxxx"
restart kubelet
Add gce in controller-master
vi /etc/kubernetes/manifests
add this params under commands:
--cloud-provider=gce
then restart the control plane.
run the ps -ef |grep controller then you must see "gce" in controller output.
Note: Above method is not recommended on the production system, use kubeadm config to update the controller-manager settings.