kubernetes coreos rbd storageclass - kubernetes

I want use k8s storageclass under coreos, but failed
.CoreOS version is stable (1122.2)
.Hyperkube version is v1.4.3_coreos.0
k8s cluster deployed by coreos-kubernetes script , and modify rkt_opts for rbd recommandded by kubelet-wrapper.md
ceph version is jewel, I have mounted a rbd image on coreos , it works well.
now, I try to use pvc in pods, Refer to the kubernetes official document https://github.com/kubernetes/kubernetes/tree/master/examples/experimental/persistent-volume-provisioning
the config files:
**ceph-secret-admin.yaml**
apiVersion: v1
kind: Secret
metadata:
name: ceph-secret-admin
namespace: kube-system
data:
key: QVFDTEl2NVg5c0U2R1JBQVRYVVVRdUZncDRCV294WUJtME1hcFE9PQ==
**ceph-secret-user.yaml**
apiVersion: v1
kind: Secret
metadata:
name: ceph-secret-user
data:
key: QVFDTEl2NVg5c0U2R1JBQVRYVVVRdUZncDRCV294WUJtME1hcFE9PQ==
**rbd-storage-class.yaml**
apiVersion: storage.k8s.io/v1beta1
kind: StorageClass
metadata:
name: kubepool
annotations:
storageclass.beta.kubernetes.io/is-default-class: 'true'
provisioner: kubernetes.io/rbd
parameters:
monitors: 10.199.134.2:6789,10.199.134.3:6789,10.199.134.4:6789
adminId: rbd
adminSecretName: ceph-secret-admin
adminSecretNamespace: kube-system
pool: rbd
userId: rbd
userSecretName: ceph-secret-user
**claim1.json :**
{
"kind": "PersistentVolumeClaim",
"apiVersion": "v1",
"metadata": {
"name": "claim1",
"annotations": {
"volume.beta.kubernetes.io/storage-class": "kubepool"
}
},
"spec": {
"accessModes": [
"ReadWriteOnce"
],
"resources": {
"requests": {
"storage": "3Gi"
}
}
}
}
the secret create ok, the storageclass create seems ok, but can't describe (no description has been implemented for "StorageClass"), when create pvc, it's status always pending , describe it:
Name: claim1
Namespace: default
Status: Pending
Volume:
Labels: <none>
Capacity:
Access Modes:
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
16m 14s 66 {persistentvolume-controller } Warning ProvisioningFailed no volume plugin matched
Could some one help me ?

Related

How can I troubleshoot pod stuck at ContainerCreating

I'm trying to troubleshoot a failing pod but I cannot gather enough info to do so. Hoping someone can assist.
[server-001 ~]$ kubectl get pods sandboxed-nginx-98bb68c4d-26ljd
NAME READY STATUS RESTARTS AGE
sandboxed-nginx-98bb68c4d-26ljd 0/1 ContainerCreating 0 18m
[server-001 ~]$ kubectl logs sandboxed-nginx-98bb68c4d-26ljd
Error from server (BadRequest): container "nginx-kata" in pod "sandboxed-nginx-98bb68c4d-26ljd" is waiting to start: ContainerCreating
[server-001 ~]$ kubectl describe pods sandboxed-nginx-98bb68c4d-26ljd
Name: sandboxed-nginx-98bb68c4d-26ljd
Namespace: default
Priority: 0
Node: worker-001/100.100.230.34
Start Time: Fri, 08 Jul 2022 09:41:08 +0000
Labels: name=sandboxed-nginx
pod-template-hash=98bb68c4d
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/sandboxed-nginx-98bb68c4d
Containers:
nginx-kata:
Container ID:
Image: dummy-registry.com/test/nginx:1.17.7
Image ID:
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-887n4 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-887n4:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 25m default-scheduler Successfully assigned default/sandboxed-nginx-98bb68c4d-26ljd to worker-001
Warning FailedCreatePodSandBox 5m19s kubelet Failed to create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded
[worker-001 ~]$ sudo crictl images
IMAGE TAG IMAGE ID SIZE
dummy-registry.com/test/externalip-webhook v1.0.0-1 e2e778d82e6c3 147MB
dummy-registry.com/test/flannel v0.14.1 52e470e10ebf9 209MB
dummy-registry.com/test/kube-proxy v1.22.8 93ab9e5f0c4d6 869MB
dummy-registry.com/test/nginx 1.17.7 db634ca7e0456 310MB
dummy-registry.com/test/pause 3.5 dabdc5fea3665 711kB
dummy-registry.com/test/linux 7-slim 41388a53234b5 140MB
[worker-001 ~]$ sudo crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
b1c6d1bf2f09a db634ca7e045638213d3f68661164aa5c7d5b469631bbb79a8a65040666492d5 34 minutes ago Running nginx 0 3598c2c4d3e88
caaa14b395eb8 e2e778d82e6c3a8cc82cdf3083e55b084869cd5de2a762877640aff1e88659dd 48 minutes ago Running webhook 0 8a9697e2af6a1
4f97ac292753c 52e470e10ebf93ea5d2aa32f5ca2ecfa3a3b2ff8d2015069618429f3bb9cda7a 48 minutes ago Running kube-flannel 2 a4e4d0c14cafc
aacb3ed840065 93ab9e5f0c4d64c135c2e4593cd772733b025f53a9adb06e91fe49f500b634ab 48 minutes ago Running kube-proxy 2 9e0bc036c2d00
[worker-001 ~]$ sudo crictl pods
POD ID CREATED STATE NAME NAMESPACE ATTEMPT RUNTIME
3598c2c4d3e88 34 minutes ago Ready nginx-9xtss default 0 (default)
8a9697e2af6a1 48 minutes ago Ready externalip-validation-webhook-7988bff847-ntv6d externalip-validation-system 0 (default)
9e0bc036c2d00 48 minutes ago Ready kube-proxy-9c7cb kube-system 0 (default)
a4e4d0c14cafc 48 minutes ago Ready kube-flannel-ds-msz7w kube-system 0 (default)
[worker-001 ~]$ cat /etc/crio/crio.conf
[crio]
[crio.image]
pause_image = "dummy-registry.com/test/pause:3.5"
registries = ["docker.io", "dummy-registry.com/test"]
[crio.network]
plugin_dirs = ["/opt/cni/bin"]
[crio.runtime]
cgroup_manager = "systemd"
conmon_cgroup = "system.slice"
conmon = "/usr/libexec/crio/conmon"
manage_network_ns_lifecycle = true
manage_ns_lifecycle = true
selinux = false
[crio.runtime.runtimes]
[crio.runtime.runtimes.kata]
runtime_path = "/usr/bin/containerd-shim-kata-v2"
runtime_type = "vm"
runtime_root = "/run/vc"
[crio.runtime.runtimes.runc]
runtime_path = "/usr/bin/runc"
runtime_type = "oci"
[worker-001 ~]$ egrep -v '^#|^;|^$' /usr/share/defaults/kata-containers/configuration-qemu.toml
[hypervisor.qemu]
initrd = "/usr/share/kata-containers/kata-containers-initrd.img"
path = "/usr/libexec/qemu-kvm"
kernel = "/usr/share/kata-containers/vmlinuz.container"
machine_type = "q35"
enable_annotations = []
valid_hypervisor_paths = ["/usr/libexec/qemu-kvm"]
kernel_params = ""
firmware = ""
firmware_volume = ""
machine_accelerators=""
cpu_features="pmu=off"
default_vcpus = 1
default_maxvcpus = 0
default_bridges = 1
default_memory = 2048
disable_block_device_use = false
shared_fs = "virtio-9p"
virtio_fs_daemon = "/usr/libexec/kata-qemu/virtiofsd"
valid_virtio_fs_daemon_paths = ["/usr/libexec/kata-qemu/virtiofsd"]
virtio_fs_cache_size = 0
virtio_fs_extra_args = ["--thread-pool-size=1", "-o", "announce_submounts"]
virtio_fs_cache = "auto"
block_device_driver = "virtio-scsi"
enable_iothreads = false
enable_vhost_user_store = false
vhost_user_store_path = "/usr/libexec/qemu-kvm"
valid_vhost_user_store_paths = ["/var/run/kata-containers/vhost-user"]
valid_file_mem_backends = [""]
pflashes = []
valid_entropy_sources = ["/dev/urandom","/dev/random",""]
[factory]
[agent.kata]
kernel_modules=[]
[runtime]
internetworking_model="tcfilter"
disable_guest_seccomp=true
disable_selinux=false
sandbox_cgroup_only=true
static_sandbox_resource_mgmt=false
sandbox_bind_mounts=[]
vfio_mode="guest-kernel"
disable_guest_empty_dir=false
experimental=[]
[image]
[server-001 ~]$ cat nginx.yaml
---
kind: RuntimeClass
apiVersion: node.k8s.io/v1
metadata:
name: kata-containers
handler: kata
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: sandboxed-nginx
spec:
replicas: 1
selector:
matchLabels:
name: sandboxed-nginx
template:
metadata:
labels:
name: sandboxed-nginx
spec:
runtimeClassName: kata-containers
containers:
- name: nginx-kata
image: dummy-registry.com/test/nginx:1.17.7
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: sandboxed-nginx
spec:
type: NodePort
ports:
- protocol: TCP
port: 80
targetPort: 80
selector:
name: sandboxed-nginx
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: nginx
labels:
name: nginx
spec:
selector:
matchLabels:
name: nginx
template:
metadata:
labels:
name: nginx
spec:
tolerations:
# this toleration is to have the daemonset runnable on master nodes
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: nginx
image: dummy-registry.com/test/nginx:1.17.7
ports:
- containerPort: 80
[server-001 ~]$ kubectl apply -f nginx.yaml
runtimeclass.node.k8s.io/kata-containers unchanged
deployment.apps/sandboxed-nginx created
service/sandboxed-nginx created
daemonset.apps/nginx created
Since you're using kata containers with cri-o runtime, your pod should have a RuntimeClass parameter which it is missing.
You need to create a RuntimeClass object which will point to the runtime installed. See the docs here for how to do that. Also, make sure that the cri-o setup on worker-001 is correctly configured with k8s. Here is documentation for that.
Afterwards, add a RuntimeClass parameter to your pod so that the container can actually run. The ContainerCreating stage is stuck since the Pod controller cannot run cri-o based containers unless the RuntimeClass is specified. Here is some documentation on understanding Container Runtimes.

imagePullSecrets mysteriously get `registry-` prefix

I am trying to deploy the superset Helm chart with a customized image. There's no option to specify imagePullSecrets for the chart. Using k8s on DigitalOcean. I linked the repository, and tested it using a basic deploy, and it "just works". That is to say, the pods get the correct value for imagePullSecret, and pulling just works.
However, when trying to install the Helm chart, the used imagePullSecret mysteriously gets a registry- prefix (there's already a -registry suffix, so it becomes registry-xxx-registry when it should just be xxx-registry). The values on the default service account are correct.
To illustrate, default service accounts for both namespaces:
$ kubectl get sa default -n test -o yaml
apiVersion: v1
imagePullSecrets:
- name: xxx-registry
kind: ServiceAccount
metadata:
creationTimestamp: "2022-04-14T14:26:41Z"
name: default
namespace: test
resourceVersion: "13125"
uid: xxx-xxx
secrets:
- name: default-token-9ggrm
$ kubectl get sa default -n superset -o yaml
apiVersion: v1
imagePullSecrets:
- name: xxx-registry
kind: ServiceAccount
metadata:
creationTimestamp: "2022-04-14T14:19:47Z"
name: default
namespace: superset
resourceVersion: "12079"
uid: xxx-xxx
secrets:
- name: default-token-wkdhv
LGTM, but after trying to install the helm chart (which fails because of registry auth), I can see that the wrong secret is set on the pods:
$ kubectl get -n superset pods -o json | jq '.items[] | {name: .spec.containers[0].name, sa: .spec.serviceAccount, secret: .spec.imagePullSecrets}'
{
"name": "superset",
"sa": "default",
"secret": [
{
"name": "registry-xxx-registry"
}
]
}
{
"name": "xxx-superset-postgresql",
"sa": "default",
"secret": [
{
"name": "xxx-registry"
}
]
}
{
"name": "redis",
"sa": "xxx-superset-redis",
"secret": null
}
{
"name": "superset",
"sa": "default",
"secret": [
{
"name": "registry-xxx-registry"
}
]
}
{
"name": "superset-init-db",
"sa": "default",
"secret": [
{
"name": "registry-xxx-registry"
}
]
}
In the test namespace the secret name is just correct. Extra interesting is that postgres DOES have the correct secret name, and that uses a Helm dependency. So it seems like there's an issue in the superset Helm chart that is causing this, but there's no imagePullSecrets values being set anywhere in the templates. And as you can see above, they are using the default service account.
I have already tried destroying and recreating the whole cluster, but the problem recurs.
I have tried version 0.5.10 (latest) of the Helm chart and version 0.3.5, both result in the same issue.
https://github.com/apache/superset/tree/dafc841e223c0f01092a2e116888a3304142e1b8/helm/superset
https://github.com/apache/superset/tree/1.3/helm/superset

MountVolume.SetUp failed for volume "deployer-conf" : object "pgo"/"pgo-deployer-cm" not registered

Trying to install Crunchydata postgres-operator.
My pgo-deploy pod is failing with error.
I have setup default nfs storage running the following commands,
# kubectl create -f rbac.yaml
the content is,
apiVersion: v1
kind: ServiceAccount
metadata:
name: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: pgo
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: nfs-client-provisioner-runner
rules:
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: run-nfs-client-provisioner
subjects:
- kind: ServiceAccount
name: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: pgo
roleRef:
kind: ClusterRole
name: nfs-client-provisioner-runner
apiGroup: rbac.authorization.k8s.io
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: leader-locking-nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: pgo
rules:
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: leader-locking-nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: pgo
subjects:
- kind: ServiceAccount
name: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: pgo
roleRef:
kind: Role
name: leader-locking-nfs-client-provisioner
apiGroup: rbac.authorization.k8s.io
# kubectl create -f class.yaml
the content:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: managed-nfs-storage
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: k8s-sigs.io/nfs-subdir-external-provisioner # or choose another name, must match deployment's env PROVISIONER_NAME'
parameters:
archiveOnDelete: "false"
# kubectl create -f deployment.yaml
the content:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nfs-client-provisioner
labels:
app: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: pgo
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: nfs-client-provisioner
template:
metadata:
labels:
app: nfs-client-provisioner
spec:
serviceAccountName: nfs-client-provisioner
containers:
- name: nfs-client-provisioner
image: k8s.gcr.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
volumeMounts:
- name: nfs-client-root
mountPath: /persistentvolumes
env:
- name: PROVISIONER_NAME
value: k8s-sigs.io/nfs-subdir-external-provisioner
- name: NFS_SERVER
value: 192.168.10.114
- name: NFS_PATH
value: /var/nfs/general
volumes:
- name: nfs-client-root
nfs:
server: 192.168.10.114
path: /var/nfs/general
Now when I apply # kubectl apply -f postgres-operator.yml with my configuration:
apiVersion: v1
kind: ServiceAccount
metadata:
name: pgo-deployer-sa
namespace: pgo
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: pgo-deployer-cr
rules:
- apiGroups:
- ''
resources:
- namespaces
verbs:
- get
- list
- create
- patch
- delete
- apiGroups:
- ''
resources:
- pods
verbs:
- list
- apiGroups:
- ''
resources:
- secrets
verbs:
- list
- get
- create
- delete
- apiGroups:
- ''
resources:
- configmaps
- services
- persistentvolumeclaims
verbs:
- get
- create
- delete
- list
- apiGroups:
- ''
resources:
- serviceaccounts
verbs:
- get
- create
- delete
- patch
- list
- apiGroups:
- apps
- extensions
resources:
- deployments
- replicasets
verbs:
- get
- list
- watch
- create
- delete
- apiGroups:
- apiextensions.k8s.io
resources:
- customresourcedefinitions
verbs:
- get
- create
- delete
- apiGroups:
- rbac.authorization.k8s.io
resources:
- clusterroles
- clusterrolebindings
- roles
- rolebindings
verbs:
- get
- create
- delete
- bind
- escalate
- apiGroups:
- rbac.authorization.k8s.io
resources:
- roles
verbs:
- create
- delete
- apiGroups:
- batch
resources:
- jobs
verbs:
- delete
- list
- apiGroups:
- crunchydata.com
resources:
- pgclusters
- pgreplicas
- pgpolicies
- pgtasks
verbs:
- delete
- list
---
apiVersion: v1
kind: ConfigMap
metadata:
name: pgo-deployer-cm
namespace: pgo
data:
values.yaml: |-
# =====================
# Configuration Options
# More info for these options can be found in the docs
# https://access.crunchydata.com/documentation/postgres-operator/latest/installation/configuration/
# =====================
archive_mode: "true"
archive_timeout: "60"
backrest_aws_s3_bucket: ""
backrest_aws_s3_endpoint: ""
backrest_aws_s3_key: ""
backrest_aws_s3_region: ""
backrest_aws_s3_secret: ""
backrest_aws_s3_uri_style: ""
backrest_aws_s3_verify_tls: "true"
backrest_gcs_bucket: ""
backrest_gcs_endpoint: ""
backrest_gcs_key_type: ""
backrest_port: "2022"
badger: "false"
ccp_image_prefix: "registry.developers.crunchydata.com/crunchydata"
ccp_image_pull_secret: ""
ccp_image_pull_secret_manifest: ""
ccp_image_tag: "centos8-13.3-4.7.0"
create_rbac: "true"
crunchy_debug: "false"
db_name: ""
db_password_age_days: "0"
db_password_length: "24"
db_port: "5432"
db_replicas: "0"
db_user: "testuser"
default_instance_memory: "128Mi"
default_pgbackrest_memory: "48Mi"
default_pgbouncer_memory: "24Mi"
default_exporter_memory: "24Mi"
delete_operator_namespace: "false"
delete_watched_namespaces: "false"
disable_auto_failover: "false"
disable_fsgroup: "false"
reconcile_rbac: "true"
exporterport: "9187"
metrics: "false"
namespace: "pgo"
namespace_mode: "dynamic"
pgbadgerport: "10000"
pgo_add_os_ca_store: "false"
pgo_admin_password: "examplepassword"
pgo_admin_perms: "*"
pgo_admin_role_name: "pgoadmin"
pgo_admin_username: "admin"
pgo_apiserver_port: "8443"
pgo_apiserver_url: "https://postgres-operator"
pgo_client_cert_secret: "pgo.tls"
pgo_client_container_install: "false"
pgo_client_install: "true"
pgo_client_version: "4.7.0"
pgo_cluster_admin: "false"
pgo_disable_eventing: "false"
pgo_disable_tls: "false"
pgo_image_prefix: "registry.developers.crunchydata.com/crunchydata"
pgo_image_pull_secret: ""
pgo_image_pull_secret_manifest: ""
pgo_image_tag: "centos8-4.7.0"
pgo_installation_name: "devtest"
pgo_noauth_routes: ""
pgo_operator_namespace: "pgo"
pgo_tls_ca_store: ""
pgo_tls_no_verify: "false"
pod_anti_affinity: "preferred"
pod_anti_affinity_pgbackrest: ""
pod_anti_affinity_pgbouncer: ""
scheduler_timeout: "3600"
service_type: "ClusterIP"
sync_replication: "false"
backrest_storage: "nfsstorage"
backup_storage: "nfsstorage"
primary_storage: "nfsstorage"
replica_storage: "nfsstorage"
pgadmin_storage: "nfsstorage"
wal_storage: ""
storage1_name: "default"
storage1_access_mode: "ReadWriteOnce"
storage1_size: "1G"
storage1_type: "dynamic"
storage2_name: "hostpathstorage"
storage2_access_mode: "ReadWriteMany"
storage2_size: "1G"
storage2_type: "create"
storage3_name: "nfsstorage"
storage3_access_mode: "ReadWriteMany"
storage3_size: "10Gi"
storage3_type: "create"
storage3_supplemental_groups: "65534"
storage4_name: "nfsstoragered"
storage4_access_mode: "ReadWriteMany"
storage4_size: "1G"
storage4_match_labels: "crunchyzone=red"
storage4_type: "create"
storage4_supplemental_groups: "65534"
storage5_name: "storageos"
storage5_access_mode: "ReadWriteOnce"
storage5_size: "5Gi"
storage5_type: "dynamic"
storage5_class: "fast"
storage6_name: "primarysite"
storage6_access_mode: "ReadWriteOnce"
storage6_size: "4G"
storage6_type: "dynamic"
storage6_class: "primarysite"
storage7_name: "alternatesite"
storage7_access_mode: "ReadWriteOnce"
storage7_size: "4G"
storage7_type: "dynamic"
storage7_class: "alternatesite"
storage8_name: "gce"
storage8_access_mode: "ReadWriteOnce"
storage8_size: "300M"
storage8_type: "dynamic"
storage8_class: "standard"
storage9_name: "rook"
storage9_access_mode: "ReadWriteOnce"
storage9_size: "1Gi"
storage9_type: "dynamic"
storage9_class: "rook-ceph-block"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: pgo-deployer-crb
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: pgo-deployer-cr
subjects:
- kind: ServiceAccount
name: pgo-deployer-sa
namespace: pgo
---
apiVersion: batch/v1
kind: Job
metadata:
name: pgo-deploy
namespace: pgo
spec:
backoffLimit: 0
template:
metadata:
name: pgo-deploy
spec:
serviceAccountName: pgo-deployer-sa
restartPolicy: Never
containers:
- name: pgo-deploy
image: registry.developers.crunchydata.com/crunchydata/pgo-deployer:centos8-4.7.0
imagePullPolicy: IfNotPresent
env:
- name: DEPLOY_ACTION
value: install
volumeMounts:
- name: deployer-conf
mountPath: "/conf"
volumes:
- name: deployer-conf
configMap:
name: pgo-deployer-cm
I get the following error:
# kubectl get pods -n pgo
NAME READY STATUS RESTARTS AGE
nfs-client-provisioner-7d485f5b8d-cnt57 1/1 Running 0 28m
pgo-deploy--1-ppzkw 0/1 Error 0 10m
# kubectl describe pod -n pgo pgo-deploy--1-ppzkw returns the following error:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 9m13s default-scheduler Successfully assigned pgo/pgo-deploy--1-ppzkw to dfsworker1
Normal Pulled 9m11s kubelet Container image "registry.developers.crunchydata.com/crunchydata/pgo-deployer:centos8-4.7.1" already present on machine
Normal Created 9m10s kubelet Created container pgo-deploy
Normal Started 9m10s kubelet Started container pgo-deploy
Warning FailedMount 8m58s (x3 over 9m) kubelet MountVolume.SetUp failed for volume "deployer-conf" : object "pgo"/"pgo-deployer-cm" not registered
even tried with # kubectl apply -f https://raw.githubusercontent.com/CrunchyData/postgres-operator/v4.7.1/installers/kubectl/postgres-operator.yml
it gives the same error. # kubectl -n pgo logs -f pgo-deploy--1-ppzkw gives the following error:
TASK [pgo-operator : Create PGClusters CRD] ************************************
fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["kubectl", "create", "-f", "/ansible/postgres-operator/roles/pgo-operator/files/crds/pgclusters-crd.yaml"], "delta": "0:00:02.599141", "end": "2021-08-09 08:24:50.295545", "msg": "non-zero return code", "rc": 1, "start": "2021-08-09 08:24:47.696404", "stderr": "error: unable to recognize \"/ansible/postgres-operator/roles/pgo-operator/files/crds/pgclusters-crd.yaml\": no matches for kind \"CustomResourceDefinition\" in version \"apiextensions.k8s.io/v1beta1\"", "stderr_lines": ["error: unable to recognize \"/ansible/postgres-operator/roles/pgo-operator/files/crds/pgclusters-crd.yaml\": no matches for kind \"CustomResourceDefinition\" in version \"apiextensions.k8s.io/v1beta1\""], "stdout": "", "stdout_lines": []}
PLAY RECAP *********************************************************************
localhost : ok=21 changed=5 unreachable=0 failed=1 skipped=17 rescued=0 ignored=0
Can anyone help me to solve this? All my machines are ubuntu 20.04. It was all working with the same configurations and steps a few days ago until I deleted the pgo namespace and followed all my past procedures.
My kubernetes version: v1.22.0.
The error you provided says what is wrong:
error: unable to recognize \"/ansible/postgres-operator/roles/pgo-operator/files/crds/pgclusters-crd.yaml\": no matches for kind \"CustomResourceDefinition\" in version \"apiextensions.k8s.io/v1beta1\"
CustomResourceDefinition is no longer in beta API:
kubectl explain CustomResourceDefinition
KIND: CustomResourceDefinition
VERSION: apiextensions.k8s.io/v1
Ideally, the editor in charge of that operator already ships with some up-to-date CustomResourceDefinitions. In your case, the last copy seems to be available over here.
Though if your CRD is outdated: there may be other changes you would want to pull out of Crunchy latest release.
Otherwise, we may consider rewriting those objects ourselves:
change apiVersion to apiextensions.k8s.io/v1
fix spec complying with the last schema
spec.additionalPrinterColumns, spec.subresources or spec.validation would need to move into a spec.versions array. You no longer have to define a schema for your resources metadata - if you did configure a schema in your CRD.
The new layout would look something like this:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: crname.api-group
spec:
group: api-group
names:
kind: CrName
listKind: CrNameList
plural: crnames
singular: crname
scope: Namespaced
versions:
- name: v1
additionalPrinterColumns:
- name: Age
type: date
jsonPath: .metadata.creationTimestamp
schema:
openAPIV3Schema:
properties:
apiVersion:
type: string
kind:
type: string
spec:
properties:
[...]
type: object
served: true
storage: true
subresources:
status: {}
- name: v1beta1
[...]

Using rabbitmq's queue to do hpa, access to custom.metrics fails

Can be successfully accessed through api,It can clearly obtain information,by /apis/custom.metrics.k8s.io/v1beta1
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/default/services/rabbitmq-exporter/rabbitmq_queue_messages_ready?metricLabelSelector=queue%3Dtest-1 | jq .
{
"kind": "MetricValueList",
"apiVersion": "custom.metrics.k8s.io/v1beta1",
"metadata": {
"selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/services/rabbitmq-exporter/rabbitmq_queue_messages_ready"
},
"items": [
{
"describedObject": {
"kind": "Service",
"namespace": "default",
"name": "rabbitmq-exporter",
"apiVersion": "/v1"
},
"metricName": "rabbitmq_queue_messages_ready",
"timestamp": "2020-02-17T13:50:20Z",
"value": "14",
"selector": null
}
]
}
HPA file
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: test-hpa
namespace: default
spec:
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: Deployment
name: test
minReplicas: 1
maxReplicas: 5
metrics:
- type: Object
object:
metric:
name: "rabbitmq_queue_messages_ready"
selector:
matchLabels:
"queue": "test-1"
describedObject:
apiVersion: "custom.metrics.k8s.io/v1beta1"
kind: Service
name: rabbitmq-exporter
target:
type: Value
value: 4
Error message
Name: test-hpa
Namespace: default
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"autoscaling/v2beta2","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"test-hpa","namespace":"defa...
CreationTimestamp: Mon, 17 Feb 2020 21:38:08 +0800
Reference: Deployment/test
Metrics: ( current / target )
"rabbitmq_queue_messages_ready" on Service/rabbitmq-exporter (target value): <unknown> / 4
Min replicas: 1
Max replicas: 5
Deployment pods: 1 current / 0 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale
ScalingActive False FailedGetObjectMetric the HPA was unable to compute the replica count: unable to get metric rabbitmq_queue_messages_ready: Service on default rabbitmq-exporter/object metrics are not yet supported
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedComputeMetricsReplicas 97s (x12 over 4m22s) horizontal-pod-autoscaler Invalid metrics (1 invalid out of 1), last error was: failed to get object metric value: unable to get metric rabbitmq_queue_messages_ready: Service on default rabbitmq-exporter/object metrics are not yet supported
Warning FailedGetObjectMetric 82s (x13 over 4m22s) horizontal-pod-autoscaler unable to get metric rabbitmq_queue_messages_ready: Service on default rabbitmq-exporter/object metrics are not yet supported

Scale deployment based on custom metric

I'm trying to scale a deployment based on a custom metric coming from a custom metric server. I deployed my server and when I do
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/services/kubernetes/test-metric"
I get back this JSON
{
"kind": "MetricValueList",
"apiVersion": "custom.metrics.k8s.io/v1beta1",
"metadata": {
"selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/services/kubernetes/test-metric"
},
"items": [
{
"describedObject": {
"kind": "Service",
"namespace": "default",
"name": "kubernetes",
"apiVersion": "/v1"
},
"metricName": "test-metric",
"timestamp": "2019-01-26T02:36:19Z",
"value": "300m",
"selector": null
}
]
}
Then I created my hpa.yml using this
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: test-all-deployment
namespace: default
spec:
maxReplicas: 10
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: test-all-deployment
metrics:
- type: Object
object:
target:
kind: Service
name: kubernetes
apiVersion: custom.metrics.k8s.io/v1beta1
metricName: test-metric
targetValue: 200m
but it doesn't scale and I'm not sure what is wrong. running get hpa returns
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
test-all-deployment Deployment/test-all-deployment <unknown>/200m 1 10 1 9m
The part I'm not sure about is the target object in the metrics collection in the hpa definition. Looking at the doc here https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/
It has
describedObject:
apiVersion: extensions/v1beta1
kind: Ingress
name: main-route
target:
kind: Value
value: 10k
but that gives me a validation error for API v2beta1. and looking at the actual object here https://github.com/kubernetes/api/blob/master/autoscaling/v2beta1/types.go#L296 it doesn't seem to match. I don't know how to specify that with the v2beta1 API.
It looks like there is a mistake in the documentation. In the same example two diffierent API version are used.
autoscaling/v2beta1 notation:
- type: Pods
pods:
metric:
name: packets-per-second
targetAverageValue: 1k
autoscaling/v2beta2 notation:
- type: Resource
resource:
name: cpu
target:
type: AverageUtilization
averageUtilization: 50
There is a difference between autoscaling/v2beta1 and autoscaling/v2beta2 APIs:
kubectl get hpa.v2beta1.autoscaling -o yaml --export > hpa2b1-export.yaml
kubectl get hpa.v2beta2.autoscaling -o yaml --export > hpa2b2-export.yaml
diff -y hpa2b1-export.yaml hpa2b2-export.yaml
#hpa.v2beta1.autoscaling hpa.v2beta2.autoscaling
#-----------------------------------------------------------------------------------
apiVersion: v1 apiVersion: v1
items: items:
- apiVersion: autoscaling/v2beta1 | - apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler kind: HorizontalPodAutoscaler
metadata: metadata:
creationTimestamp: "2019-03-21T13:17:47Z" creationTimestamp: "2019-03-21T13:17:47Z"
name: php-apache name: php-apache
namespace: default namespace: default
resourceVersion: "8441304" resourceVersion: "8441304"
selfLink: /apis/autoscaling/v2beta1/namespaces/default/ho | selfLink: /apis/autoscaling/v2beta2/namespaces/default/ho
uid: b8490a0a-4bdb-11e9-9043-42010a9c0003 uid: b8490a0a-4bdb-11e9-9043-42010a9c0003
spec: spec:
maxReplicas: 10 maxReplicas: 10
metrics: metrics:
- resource: - resource:
name: cpu name: cpu
targetAverageUtilization: 50 | target:
> averageUtilization: 50
> type: Utilization
type: Resource type: Resource
minReplicas: 1 minReplicas: 1
scaleTargetRef: scaleTargetRef:
apiVersion: extensions/v1beta1 apiVersion: extensions/v1beta1
kind: Deployment kind: Deployment
name: php-apache name: php-apache
status: status:
conditions: conditions:
- lastTransitionTime: "2019-03-21T13:18:02Z" - lastTransitionTime: "2019-03-21T13:18:02Z"
message: recommended size matches current size message: recommended size matches current size
reason: ReadyForNewScale reason: ReadyForNewScale
status: "True" status: "True"
type: AbleToScale type: AbleToScale
- lastTransitionTime: "2019-03-21T13:18:47Z" - lastTransitionTime: "2019-03-21T13:18:47Z"
message: the HPA was able to successfully calculate a r message: the HPA was able to successfully calculate a r
resource utilization (percentage of request) resource utilization (percentage of request)
reason: ValidMetricFound reason: ValidMetricFound
status: "True" status: "True"
type: ScalingActive type: ScalingActive
- lastTransitionTime: "2019-03-21T13:23:13Z" - lastTransitionTime: "2019-03-21T13:23:13Z"
message: the desired replica count is increasing faster message: the desired replica count is increasing faster
rate rate
reason: TooFewReplicas reason: TooFewReplicas
status: "True" status: "True"
type: ScalingLimited type: ScalingLimited
currentMetrics: currentMetrics:
- resource: - resource:
currentAverageUtilization: 0 | current:
currentAverageValue: 1m | averageUtilization: 0
> averageValue: 1m
name: cpu name: cpu
type: Resource type: Resource
currentReplicas: 1 currentReplicas: 1
desiredReplicas: 1 desiredReplicas: 1
kind: List kind: List
metadata: metadata:
resourceVersion: "" resourceVersion: ""
selfLink: "" selfLink: ""
Here is how the object definition is supposed to look like:
#hpa.v2beta1.autoscaling hpa.v2beta2.autoscaling
#-----------------------------------------------------------------------------------
type: Object type: Object
object: object:
metric: metric:
name: requests-per-second name: requests-per-second
describedObject: describedObject:
apiVersion: extensions/v1beta1 apiVersion: extensions/v1beta1
kind: Ingress kind: Ingress
name: main-route name: main-route
targetValue: 2k target:
type: Value
value: 2k