failed to register a Autopilot GKE cluster to Anthos - kubernetes

I am trying to add an existing GKE cluster (auto-pilot one) to Anthos within the same project. It updated the hub memberships, however, the gke-connect agent pod is failing with a RBAC-related error.
$ for ns in $(kubectl get ns -o jsonpath={.items..metadata.name} -l hub.gke.io/project); do
> echo "======= Logs $ns ======="
> kubectl logs -n $ns -l app=gke-connect-agent
> done
======= Logs gke-connect =======
2021/03/26 15:57:50.604149 gkeconnect_agent.go:39: GKE Connect Agent. Log timestamps in UTC.
2021/03/26 15:57:50.604380 gkeconnect_agent.go:40:
Built on: 2021-03-19 09:40:57 +0000 UTC
Built at: 363842994
Build Status: mint
Build Label: 20210319-01-00
2021/03/26 15:57:50.715289 gkeconnect_agent.go:50: error creating kubernetes
connect agent: unable to retrieve namespace "kube-system" to be used as
connectionID: namespaces "kube-system" is forbidden: User
"system:serviceaccount:gke-connect:connect-agent-sa" cannot get resource
"namespaces" in API group "" in the namespace "kube-system"
I checked the rolebindings for connect-agent-sa service account the role seems to have necessary permissions to get namespaces but it's failing.
$ k get role gke-connect-agent-20210319-01-00 -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
creationTimestamp: "2021-03-26T16:35:12Z"
labels:
hub.gke.io/project: xxxxxxxxxxxxxxxxxxx
version: 20210319-01-00
managedFields:
- apiVersion: rbac.authorization.k8s.io/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:labels:
.: {}
f:hub.gke.io/project: {}
f:version: {}
f:rules: {}
manager: GoogleCloudConsole
operation: Update
time: "2021-03-26T16:35:12Z"
name: gke-connect-agent-20210319-01-00
namespace: gke-connect
resourceVersion: "10595136"
selfLink: /apis/rbac.authorization.k8s.io/v1/namespaces/gke-connect/roles/gke-connect-agent-20210319-01-00
uid: xxxxxxxx
rules:
- apiGroups:
- ""
resources:
- secrets
- namespaces <-- namespaces!!!
- configmaps
verbs:
- get <-- get!!!
- watch
- list
- apiGroups:
- ""
resources:
- events
verbs:
- create
Are there any other restrictions and policies that I am unaware of? Is that because of the auto-pilot cluster?

Related

How do I fix a role-based problem when my role appears to have the correct permissions?

I am trying to establish the namespace "sandbox" in Kubernetes and have been using it for several days for several days without issue. Today I got the below error.
I have checked to make sure that I have all of the requisite configmaps in place.
Is there a log or something where I can find what this is referring to?
panic: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
I did find this (MountVolume.SetUp failed for volume "kube-api-access-fcz9j" : object "default"/"kube-root-ca.crt" not registered) thread and have applied the below patch to my service account, but I am still getting the same error.
automountServiceAccountToken: false
UPDATE:
In answer to #p10l I am working in a bare-metal cluster version 1.23.0. No terraform.
I am getting closer, but still not there.
This appears to be another RBAC problem, but the error does not make sense to me.
I have a user "dma." I am running workflows in the "sandbox" namespace using the context dma#kubernetes
The error now is
Create request failed: workflows.argoproj.io is forbidden: User "dma" cannot create resource "workflows" in API group "argoproj.io" in the namespace "sandbox"
but that user indeed appears to have the correct permissions.
This is the output of
kubectl get role dma -n sandbox -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"rbac.authorization.k8s.io/v1","kind":"Role","metadata":{"annotations":{},"name":"dma","namespace":"sandbox"},"rules":[{"apiGroups":["","apps","autoscaling","batch","extensions","policy","rbac.authorization.k8s.io","argoproj.io"],"resources":["pods","configmaps","deployments","events","pods","persistentvolumes","persistentvolumeclaims","services","workflows"],"verbs":["get","list","watch","create","update","patch","delete"]}]}
creationTimestamp: "2021-12-21T19:41:38Z"
name: dma
namespace: sandbox
resourceVersion: "1055045"
uid: 94191881-895d-4457-9764-5db9b54cdb3f
rules:
- apiGroups:
- ""
- apps
- autoscaling
- batch
- extensions
- policy
- rbac.authorization.k8s.io
- argoproj.io
- workflows.argoproj.io
resources:
- pods
- configmaps
- deployments
- events
- pods
- persistentvolumes
- persistentvolumeclaims
- services
- workflows
verbs:
- get
- list
- watch
- create
- update
- patch
- delete
This is the output of kubectl get rolebinding -n sandbox dma-sandbox-rolebinding -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"rbac.authorization.k8s.io/v1","kind":"RoleBinding","metadata":{"annotations":{},"name":"dma-sandbox-rolebinding","namespace":"sandbox"},"roleRef":{"apiGroup":"rbac.authorization.k8s.io","kind":"Role","name":"dma"},"subjects":[{"kind":"ServiceAccount","name":"dma","namespace":"sandbox"}]}
creationTimestamp: "2021-12-21T19:56:06Z"
name: dma-sandbox-rolebinding
namespace: sandbox
resourceVersion: "1050593"
uid: d4d53855-b5fc-4f29-8dbd-17f682cc91dd
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: dma
subjects:
- kind: ServiceAccount
name: dma
namespace: sandbox
The issue you are describing is a reoccuring one, described here and here where your cluster lacks KUBECONFIG environment variable.
First, run echo $KUBECONFIG on all your nodes to see if it's empty.
If it is, look for the config file in your cluster, then copy it to all the nodes, then export this variable by running export KUBECONFIG=/path/to/config. This file can be usually found at ~/.kube/config/ or /etc/kubernetes/admin.conf` on master nodes.
Let me know, if this solution worked in your case.

Argo Workflow + Spark operator + App logs not generated

Am in very early stages of exploring Argo with Spark operator to run Spark samples on the minikube setup on my EC2 instance.
Following are the resources details, not sure why am not able to see the spark app logs.
WORKFLOW.YAML
kind: Workflow
metadata:
name: spark-argo-groupby
spec:
entrypoint: sparkling-operator
templates:
- name: spark-groupby
resource:
action: create
manifest: |
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
generateName: spark-argo-groupby
spec:
type: Scala
mode: cluster
image: gcr.io/spark-operator/spark:v3.0.3
imagePullPolicy: Always
mainClass: org.apache.spark.examples.GroupByTest
mainApplicationFile: local:///opt/spark/spark-examples_2.12-3.1.1-hadoop-2.7.jar
sparkVersion: "3.0.3"
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
labels:
version: 3.0.0
executor:
cores: 1
instances: 1
memory: "512m"
labels:
version: 3.0.0
- name: sparkling-operator
dag:
tasks:
- name: SparkGroupBY
template: spark-groupby
ROLES
# Role for spark-on-k8s-operator to create resources on cluster
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: spark-cluster-cr
labels:
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-edit: "true"
rules:
- apiGroups:
- sparkoperator.k8s.io
resources:
- sparkapplications
verbs:
- '*'
---
# Allow airflow-worker service account access for spark-on-k8s
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: argo-spark-crb
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: spark-cluster-cr
subjects:
- kind: ServiceAccount
name: default
namespace: argo
ARGO UI
To dig deep I tried all the steps that's listed on https://dev.to/crenshaw_dev/how-to-debug-an-argo-workflow-31ng yet could not get app logs.
Basically when I run these examples am expecting spark app logs to be printed - in this case output of following Scala example
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/GroupByTest.scala
Interesting when I list PODS, I was expecting to see driver pods and executor pods but always see only one POD and it has above logs as in the image attached. Please help me to understand why logs are not generated and how can I get it?
RAW LOGS
$ kubectl logs spark-pi-dag-739246604 -n argo
time="2021-12-10T13:28:09.560Z" level=info msg="Starting Workflow Executor" version="{v3.0.3 2021-05-11T21:14:20Z 02071057c082cf295ab8da68f1b2027ff8762b5a v3.0.3 clean go1.15.7 gc linux/amd64}"
time="2021-12-10T13:28:09.581Z" level=info msg="Creating a docker executor"
time="2021-12-10T13:28:09.581Z" level=info msg="Executor (version: v3.0.3, build_date: 2021-05-11T21:14:20Z) initialized (pod: argo/spark-pi-dag-739246604) with template:\n{\"name\":\"sparkpi\",\"inputs\":{},\"outputs\":{},\"metadata\":{},\"resource\":{\"action\":\"create\",\"manifest\":\"apiVersion: \\\"sparkoperator.k8s.io/v1beta2\\\"\\nkind: SparkApplication\\nmetadata:\\n generateName: spark-pi-dag\\nspec:\\n type: Scala\\n mode: cluster\\n image: gjeevanm/spark:v3.1.1\\n imagePullPolicy: Always\\n mainClass: org.apache.spark.examples.SparkPi\\n mainApplicationFile: local:///opt/spark/spark-examples_2.12-3.1.1-hadoop-2.7.jar\\n sparkVersion: 3.1.1\\n driver:\\n cores: 1\\n coreLimit: \\\"1200m\\\"\\n memory: \\\"512m\\\"\\n labels:\\n version: 3.0.0\\n executor:\\n cores: 1\\n instances: 1\\n memory: \\\"512m\\\"\\n labels:\\n version: 3.0.0\\n\"},\"archiveLocation\":{\"archiveLogs\":true,\"s3\":{\"endpoint\":\"minio:9000\",\"bucket\":\"my-bucket\",\"insecure\":true,\"accessKeySecret\":{\"name\":\"my-minio-cred\",\"key\":\"accesskey\"},\"secretKeySecret\":{\"name\":\"my-minio-cred\",\"key\":\"secretkey\"},\"key\":\"spark-pi-dag/spark-pi-dag-739246604\"}}}"
time="2021-12-10T13:28:09.581Z" level=info msg="Loading manifest to /tmp/manifest.yaml"
time="2021-12-10T13:28:09.581Z" level=info msg="kubectl create -f /tmp/manifest.yaml -o json"
time="2021-12-10T13:28:10.348Z" level=info msg=argo/SparkApplication.sparkoperator.k8s.io/spark-pi-daghhl6s
time="2021-12-10T13:28:10.348Z" level=info msg="Starting SIGUSR2 signal monitor"
time="2021-12-10T13:28:10.348Z" level=info msg="No output parameters"
As Michael mentioned in his answer, Argo Workflows does not know how other CRDs (such as SparkApplication that you used) work and thus could not pull the logs from the pods created by that particular CRD.
However, you can add the label workflows.argoproj.io/workflow: {{workflow.name}} to the pods generated by SparkApplication to let Argo Workflows know and then use argo logs -c <container-name> to pull the logs from those pods.
You can find an example here but Kubeflow CRD but in your case you'll want to add labels to the executor and driver to your SparkApplication CRD in the resource template: https://github.com/argoproj/argo-workflows/blob/master/examples/k8s-resource-log-selector.yaml
Argo Workflows' resource templates (like your spark-groupby template) are simplistic. The Workflow controller is running kubectl create, and that's where its involvement in the SparkApplication ends.
The logs you're seeing from the Argo Workflow pod describe the kubectl create process. Your resource is written to a temporary yaml file and then applied to the cluster.
time="2021-12-10T13:28:09.581Z" level=info msg="Loading manifest to /tmp/manifest.yaml"
time="2021-12-10T13:28:09.581Z" level=info msg="kubectl create -f /tmp/manifest.yaml -o json"
time="2021-12-10T13:28:10.348Z" level=info msg=argo/SparkApplication.sparkoperator.k8s.io/spark-pi-daghhl6s
Old answer:
To view the logs generated by your SparkApplication, you'll need to
follow the Spark docs. I'm not familiar, but I'm guessing the
application gets run in a Pod somewhere. If you can find that pod, you
should be able to view the logs with kubectl logs.
It would be really cool if Argo Workflows could pull Spark logs into
its UI. But building a generic solution would probably be
prohibitively difficult.
Update:
Check Yuan's answer. There's a way to pull the Spark logs into the Workflows CLI!

Kubectl delete tls when no namspace

There were a namespace "sandbox" on the node which was deleted, but there is still a challenge for a certificate "echo-tls".
But i can not access anymore sandbox namespace to delete this cert.
Could anyone help me deleting this resource ?
Here are the logs of the cert-manager :
Found status change for Certificate "echo-tls" condition "Ready": "True" -> "False"; setting lastTransitionTime to...
cert-manager/controller/CertificateReadiness "msg"="re-queuing item due to error processing" "error"="Operation cannot be fulfilled on certificates.cert-manager.io \"echo-tls\": StorageError: invalid object, Code: 4, Key: /cert-manager.io/certificates/sandbox/echo-tls, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: ..., UID in object meta: " "key"="sandbox/echo-tls"
After restarting the pod cert-manager here are the logs :
cert-manager/controller/certificaterequests/handleOwnedResource "msg"="error getting referenced owning resource" "error"="certificaterequest.cert-manager.io \"echo-tls-bkmm8\" not found" "related_resource_kind"="CertificateRequest" "related_resource_name"="echo-tls-bkmm8" "related_resource_namespace"="sandbox" "resource_kind"="Order" "resource_name"="echo-tls-bkmm8-1177139468" "resource_namespace"="sandbox" "resource_version"="v1"
cert-manager/controller/orders "msg"="re-queuing item due to error processing" "error"="ACME client for issuer not initialised/available" "key"="sandbox/echo-tls-dwpt4-1177139468"
And then the same logs as before
The issuer :
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
email: ***
server: https://acme-v02.api.letsencrypt.org/directory
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress: {}
The configs for deployment :
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: <APP_NAME>
annotations:
kubernetes.io/tls-acme: "true"
kubernetes.io/ingress.class: nginx-<ENV>
acme.cert-manager.io/http01-ingress-class: nginx-<ENV>
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
tls:
- hosts:
- ***.fr
secretName: <APP_NAME>-tls
rules:
- host: ***.fr
http:
paths:
- backend:
serviceName: <APP_NAME>
servicePort: 80
.k8s_config: &k8s_config
before_script:
- export HOME=/tmp
- export K8S_NAMESPACE="${APP_NAME}"
- kubectl config set-cluster k8s --server="${K8S_SERVER}"
- kubectl config set clusters.k8s.certificate-authority-data ${K8S_CA_DATA}
- kubectl config set-credentials default --token="${K8S_USER_TOKEN}"
- kubectl config set-context default --cluster=k8s --user=default --namespace=default
- kubectl config set-context ${K8S_NAMESPACE} --cluster=k8s --user=default --namespace=${K8S_NAMESPACE}
- kubectl config use-context default
- if [ -z `kubectl get namespace ${K8S_NAMESPACE} --no-headers --output=go-template={{.metadata.name}} 2>/dev/null` ]; then kubectl create namespace ${K8S_NAMESPACE}; fi
- if [ -z `kubectl --namespace=${K8S_NAMESPACE} get secret *** --no-headers --output=go-template={{.metadata.name}} 2>/dev/null` ]; then kubectl get secret *** --output yaml | sed "s/namespace:\ default/namespace:\ ${K8S_NAMESPACE}/" | kubectl create -f - ; fi
- kubectl config use-context ${K8S_NAMESPACE}
Usually certificates are stored inside Kubernete secrets: https://kubernetes.io/docs/concepts/configuration/secret/#tls-secrets. You can retrieve secrets using kubectl get secrets --all-namespaces. You can also check which secrets are used by a given pod by checking its yaml description: kubectl get pods -n <pod-namespace> -o yaml (additional informations: https://kubernetes.io/docs/concepts/configuration/secret/#using-secrets-as-files-from-a-pod)
A namespace is cluster-wide, it is not located on any node. So deleting a node does not delete any namespace.
If above tracks does not fit your need, could you please provide some yaml files and some command-line instructions which would allow reproducing the problem?
Finally this sunday the cert-manger has stop challenges on the old tls without any other action.

How to authenticate and access Kubernetes cluster for devops pipeline?

Normally you'd do ibmcloud login ⇒ ibmcloud ks cluster-config mycluster ⇒ copy and paste the export KUBECONFIG= and then you can run your kubectl commands.
But if this were being done for some automated devops pipeline outside of IBM Cloud, what is the method for getting authenticating and getting access to the cluster?
You should not copy your kubeconfig to the pipeline. Instead you can create a service account with permissions to a particular namespace and then use its credentials to access the cluster.
What I do is create a service account and role binding like this:
apiVersion: v1
kind: ServiceAccount
metadata:
name: gitlab-tez-dev # account name
namespace: tez-dev #namespace
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: tez-dev-full-access #role
namespace: tez-dev
rules:
- apiGroups: ["", "extensions", "apps"]
resources: ["deployments", "replicasets", "pods", "services"] #resources to which permissions are granted
verbs: ["*"] # what actions are allowed
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: tez-dev-view
namespace: tez-dev
subjects:
- kind: ServiceAccount
name: gitlab-tez-dev
namespace: tez-dev
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: tez-dev-full-access
Then you can get the token for the service account using:
kubectl describe secrets -n <namespace> gitlab-tez-dev-token-<value>
The output:
Name: gitlab-tez-dev-token-lmlwj
Namespace: tez-dev
Labels: <none>
Annotations: kubernetes.io/service-account.name: gitlab-tez-dev
kubernetes.io/service-account.uid: 5f0dae02-7b9c-11e9-a222-0a92bd3a916a
Type: kubernetes.io/service-account-token
Data
====
ca.crt: 1042 bytes
namespace: 7 bytes
token: <TOKEN>
In the above command, namespace is the namespace in which you created the account and the value is the unique value which you will see when you do
kubectl get secret -n <namespace>
Copy the token to your pipeline environment variables or configuration and then you can access it in the pipeline. For example, in gitlab I do (only the part that is relevant here):
k8s-deploy-stage:
stage: deploy
image: lwolf/kubectl_deployer:latest
services:
- docker:dind
only:
refs:
- dev
script:
######## CREATE THE KUBECFG ##########
- kubectl config set-cluster ${K8S_CLUSTER_NAME} --server=${K8S_URL}
- kubectl config set-credentials gitlab-tez-dev --token=${TOKEN}
- kubectl config set-context tez-dev-context --cluster=${K8S_CLUSTER_NAME} --user=gitlab-tez-dev --namespace=tez-dev
- kubectl config use-context tez-dev-context
####### NOW COMMANDS WILL BE EXECUTED AS THE SERVICE ACCOUNT #########
- kubectl apply -f deployment.yml
- kubectl apply -f service.yml
- kubectl rollout status -f deployment.yml
The KUBECONFIG environment variable is a list of paths to Kubernetes configuration files that define one or more (switchable) contexts for kubectl (https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/).
Copy your Kubernetes configuration file to your pipeline agent (~/.kube/config by default) and optionally set the KUBECONFIG environment variable. If you got different contexts in your config file, you may want to remove the ones you don't need in your pipeline before copying it or switch contexts using kubectl config use-context.
Everything you need to connect to your kube api server is inside that config, certs, tokens etc.
If you don't want to copy a token into a file or want to use the API to automate the retrieval of the token, you can also execute some POST commands in order to programmatically retrieve your user token.
The full docs for this are here: https://cloud.ibm.com/docs/containers?topic=containers-cs_cli_install#kube_api
The key piece is retrieving your id token with the POST https://iam.bluemix.net/identity/token call.
The body will return an id_token that you can use in your Kubernetes API calls.

Enabling ExpandPersistentVolumes

I need to resize a bunch of PVCs. It seems the easiest way to do it is through
the ExpandPersistentVolumes feature. I am however having trouble getting the
configuration to cooperate.
The ExpandPersistentVolumes feature gate is set in kubelet on all three
masters, as shown:
(output trimmed to relevant bits for sanity)
$ parallel-ssh -h /tmp/masters -P "ps aux | grep feature"
172.20.53.249: root 15206 7.4 0.5 619888 83952 ? Ssl 19:52 0:02 /opt/kubernetes/bin/kubelet --feature-gates=ExpandPersistentVolumes=true,ExperimentalCriticalPodAnnotation=true
[1] 12:53:08 [SUCCESS] 172.20...
172.20.58.111: root 17798 4.5 0.5 636280 87328 ? Ssl 19:51 0:04 /opt/kubernetes/bin/kubelet --feature-gates=ExpandPersistentVolumes=true,ExperimentalCriticalPodAnnotation=true
[2] 12:53:08 [SUCCESS] 172.20...
172.20.53.240: root 9287 4.0 0.5 645276 90528 ? Ssl 19:50 0:06 /opt/kubernetes/bin/kubelet --feature-gates=ExpandPersistentVolumes=true,ExperimentalCriticalPodAnnotation=true
[3] 12:53:08 [SUCCESS] 172.20..
The apiserver has the PersistentVolumeClaimResize admission controller, as shown:
$ kubectl --namespace=kube-system get pod -o yaml | grep -i admission
/usr/local/bin/kube-apiserver --admission-control=Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,NodeRestriction,PersistentVolumeClaimResize,ResourceQuota
/usr/local/bin/kube-apiserver --admission-control=Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,NodeRestriction,PersistentVolumeClaimResize,ResourceQuota
/usr/local/bin/kube-apiserver --admission-control=Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,NodeRestriction,PersistentVolumeClaimResize,ResourceQuota
However, when I create or edit a storage class to add allowVolumeExpansion,
it is removed on save. For example:
$ cat new-sc.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
creationTimestamp: null
labels:
k8s-addon: storage-aws.addons.k8s.io
name: gp2-2
selfLink: /apis/storage.k8s.io/v1/storageclasses/gp2
parameters:
encrypted: "true"
kmsKeyId: arn:aws:kms:us-west-2:<omitted>
type: gp2
zone: us-west-2a
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Delete
allowVolumeExpansion: true
$ kubectl create -f new-sc.yaml
storageclass "gp2-2" created
$ kubectl get sc gp2-2 -o yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
creationTimestamp: 2018-05-22T20:00:17Z
labels:
k8s-addon: storage-aws.addons.k8s.io
name: gp2-2
resourceVersion: "2546166"
selfLink: /apis/storage.k8s.io/v1/storageclasses/gp2-2
uid: <omitted>
parameters:
encrypted: "true"
kmsKeyId: arn:aws:kms:us-west-2:<omitted>
type: gp2
zone: us-west-2a
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Delete
What am I missing? What is erasing this key from my storageclass configuration?
EDIT: Here is the command used by the kube-apiserver pods. It does not say anything about feature gates. The cluster was launched using Kops.
- /bin/sh
- -c
- mkfifo /tmp/pipe; (tee -a /var/log/kube-apiserver.log < /tmp/pipe & ) ; exec
/usr/local/bin/kube-apiserver --address=127.0.0.1 --admission-control=Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,NodeRestriction,PersistentVolumeClaimResize,ResourceQuota
--allow-privileged=true --anonymous-auth=false --apiserver-count=3 --authorization-mode=RBAC
--basic-auth-file=/srv/kubernetes/basic_auth.csv --client-ca-file=/srv/kubernetes/ca.crt
--cloud-provider=aws --etcd-cafile=/srv/kubernetes/ca.crt --etcd-certfile=/srv/kubernetes/etcd-client.pem
--etcd-keyfile=/srv/kubernetes/etcd-client-key.pem --etcd-servers-overrides=/events#https://127.0.0.1:4002
--etcd-servers=https://127.0.0.1:4001 --insecure-port=8080 --kubelet-preferred-address-types=InternalIP,Hostname,ExternalIP
--proxy-client-cert-file=/srv/kubernetes/apiserver-aggregator.cert --proxy-client-key-file=/srv/kubernetes/apiserver-aggregator.key
--requestheader-allowed-names=aggregator --requestheader-client-ca-file=/srv/kubernetes/apiserver-aggregator-ca.cert
--requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group
--requestheader-username-headers=X-Remote-User --secure-port=443 --service-cluster-ip-range=100.64.0.0/13
--storage-backend=etcd3 --tls-cert-file=/srv/kubernetes/server.cert --tls-private-key-file=/srv/kubernetes/server.key
--token-auth-file=/srv/kubernetes/known_tokens.csv --v=1 > /tmp/pipe 2>&1
It could happen if you did not enable alpha feature-gate for the option.
Did you set --feature-gates option for kube-apiserver?
--feature-gates mapStringBool - A set of key=value pairs that describe feature gates for alpha/experimental features. Options are:
...
ExpandPersistentVolumes=true|false (ALPHA - default=false)
...
Update: If you don't see this option in the command line arguments, you need to add it (--feature-gates=ExpandPersistentVolumes=true).
In case you run kube-apiserver as a pod, you should edit /etc/kubernetes/manifests/kube-apiserver.yaml and add the feature-gate option to other arguments. kube-apiserver will restart automatically.
In case you run kube-apiserver as a process maintained by systemd, you should edit kube-apiserver.service or service options $KUBE_API_ARGS in a separate file, and append feature-gate option there. Restart the service with systemctl restart kube-apiserver.service command.
After enabling it, you can create a StorageClass object with allowVolumeExpansion option:
# kubectl get sc -o yaml --export
apiVersion: v1
items:
- allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
creationTimestamp: 2018-05-23T14:38:43Z
labels:
k8s-addon: storage-aws.addons.k8s.io
name: gp2-2
namespace: ""
resourceVersion: "1385"
selfLink: /apis/storage.k8s.io/v1/storageclasses/gp2-2
uid: fe516dcf-5e96-11e8-a86d-42010a9a0002
parameters:
encrypted: "true"
kmsKeyId: arn:aws:kms:us-west-2:<omitted>
type: gp2
zone: us-west-2a
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Delete
volumeBindingMode: Immediate
kind: List
metadata:
resourceVersion: ""
selfLink: ""