GKE log errors about gke-metrics-agent and UAS - kubernetes

I'm using a private GKE cluster (Version 1.23.14-gke.1800). I have the following errors in kube-system gke-metrics-agent pod logs:
**error uasexporter/exporter.go:190 Error exporting metrics to UAS {"kind": "exporter", "name": "uas", "error": "reading from stream failed: rpc error: code = PermissionDenied desc = The caller does not have permission"}
error uasexporter/exporter.go:226 failed to get response from UAS {"kind": "exporter", "name": "uas", "error": "rpc error: code = PermissionDenied desc = The caller does not have permission"}
**
app gke-metrics-agent
component gke-metrics-agent
container gke-metrics-agent
filename /var/log/pods/kube-system_gke-metrics-agent-9rbfv_6896b214-31d2-43bb-b15d-a8e1b122d41d/gke-metrics-agent/0.log
job kube-system/gke-metrics-agent
namespace kube-system
node_name gke-gke-production-production-88f13984-h83x
pod gke-metrics-agent-9rbfv
stream stderr
apiVersion: v1
kind: ServiceAccount
metadata:
creationTimestamp: "2022-12-07T10:20:55Z"
labels:
addonmanager.kubernetes.io/mode: Reconcile
name: gke-metrics-agent
namespace: kube-system
resourceVersion: "444"
uid: ...
secrets: ..
- name: gke-metrics-agent-token-6zhvq
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
creationTimestamp: "2022-12-07T10:20:56Z"
labels:
addonmanager.kubernetes.io/mode: Reconcile
name: gke-metrics-agent
resourceVersion: "452"
uid: ...
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: gke-metrics-agent
subjects:
- kind: ServiceAccount
name: gke-metrics-agent
namespace: kube-system
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
creationTimestamp: "2022-12-07T10:20:56Z"
labels:
addonmanager.kubernetes.io/mode: Reconcile
name: gke-metrics-agent
resourceVersion: "67979037"
uid: ...
rules:
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- pods
verbs:
- list
- watch
- apiGroups:
- policy
resourceNames:
- gce.gke-metrics-agent
resources:
- podsecuritypolicies
verbs:
- use
I think gke-metrics-agent is offical deamonset coming automatically in GKE.
It's obvious that is some permission problem, but I don't even know what UAS means.
I can't find any meaningful information in GCP documentation or Internet.
I tried to grant some additional cluster roles (system:gke-uas-metrics-reader, external-metrics-reader) on current gke-metrics-agent service account, but the problem still persists.
From time to time I'm also detecting following problems in my cluster:
Kubernetes aggregated API v1beta1.metrics.k8s.io/default is reporting errors
Kubernetes aggregated API v1beta1.metrics.k8s.io/default has been only 75% available over the last 10m
I think they are connected with this issue.
I will be very thankful if someone give me at least some directions.
Thank you for your time and excuse my English!

UAS stands for Unified Autoscaling Platform and provides predictive and scheduled size recommendations to Autoscaler backend, it provides additional signal to zonal Autoscaler for Predictive Autoscaling and Scheduled Autoscaling
Currently there is a known issue which is related to the UAS. This is occurring due to a LoggingMonitorConfig issue which Google is working on. For further updates on the issue follow the above link. Post a comment in the above link and ask them to do a workaround if any for now.
If you find any issue with Google products and want to raise a feature request use the link Public Issue Tracker.

Related

I am getting permission issue (cannot create resource \"Job\" in API group \"batch) while creating jobs via sensors of argo-events

I am trying to trigger a job creation from a sensor but I am getting the error below:
Job.batch is forbidden: User \"system:serviceaccount:samplens:sample-sa\" cannot create resource \"Job\" in API group \"batch\" in the namespace \"samplens\"","errorVerbose":"timed out waiting for the condition: Job.batch is forbidden: User \"system:serviceaccount:samplens:sample-sa\" cannot create resource \"Job\" in API group \"batch\" in the namespace \"samplens\"\nfailed to execute trigger\ngithub.com/argoproj/argo-events/sensors.(*SensorContext).triggerOne\n\t/home/jenkins/agent/workspace/argo-events_master/sensors/listener.go:328\ngithub.com/argoproj/argo-events/sensors.(*SensorContext).triggerActions\n\t/home/jenkins/agent/workspace/argo-events_master/sensors/listener.go:269\ngithub.com/argoproj/argo-events/sensors.(*SensorContext).listenEvents.func1.3\n\t/home/jenkins/agent/workspace/argo-events_master/sensors/listener.go:181\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357","triggerName":"sample-job","triggeredBy":["payload"],"triggeredByEvents":["38333939613965312d376132372d343262302d393032662d663731393035613130303130"],"stacktrace":"github.com/argoproj/argo-events/sensors.(*SensorContext).triggerActions\n\t/home/jenkins/agent/workspace/argo-events_master/sensors/listener.go:271\ngithub.com/argoproj/argo-events/sensors.(*SensorContext).listenEvents.func1.3\n\t/home/jenkins/agent/workspace/argo-events_master/sensors/listener.go:181"}
12
Although I have created a serviceaccount, role and rolebinding.
Here is my serviceaccount creation file:
apiVersion: v1
kind: ServiceAccount
metadata:
name: sample-sa
namespace: samplens
Here is my rbac.yaml:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: sample-role
namespace: samplens
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- create
- delete
- get
- watch
- patch
- apiGroups:
- "batch"
resources:
- jobs
verbs:
- create
- delete
- get
- watch
- patch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: sample-role-binding
namespace: samplens
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: sample-role
subjects:
- kind: ServiceAccount
name: sample-sa
namespace: samplens
and here is my sensor.yaml:
apiVersion: argoproj.io/v1alpha1
kind: Sensor
metadata:
name: webhook
spec:
template:
serviceAccountName: sample-sa
dependencies:
- name: payload
eventSourceName: webhook
eventName: devops-toolkit
triggers:
- template:
name: sample-job
k8s:
group: batch
version: v1
resource: Job
operation: create
source:
resource:
apiVersion: batch/v1
kind: Job
metadata:
name: samplejob-crypto
annotations:
argocd.argoproj.io/hook: PreSync
argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
ttlSecondsAfterFinished: 100
serviceAccountName: sample-sa
template:
spec:
serviceAccountName: sample-sa
restartPolicy: OnFailure
containers:
- name: sample-crypto-job
image: docker.artifactory.xxx.com/abc/def/yyz:master-b1b347a
Sensor is getting triggered correctly but is failing to create the job.
Can someone please help, what am I missing?
Posting this as community wiki for better visibility, feel free to edit and expand it.
The original issue was resolved by adjusting role and giving * verbs. Which means argo sensor requires more permissions in fact.
This is a working solution for testing environment, while for production RBAC should be used with principle of least privileges.
How to test RBAC
There's a kubectl syntax which allows to test if RBAC (service account + role + rolebinding) was set up as expected.
Below is example how to check if SERVICE_ACCOUNT_NAME in NAMESPACE can create jobs in namespace NAMESPACE:
kubectl auth can-i --as=system:serviceaccount:NAMESPACE:SERVICE_ACCOUNT_NAME create jobs -n NAMESPACE
The answer will be simple: yes or no.
Usefull links:
Using RBAC authorization
Checking API access
Just ran into the same issue in argo-events. Hopefully this gets fixed in the near future or at least some better documentation.
Change the following value in your sensor.yaml:
spec.triggers[0].template.k8s.resource: jobs
The relevant documentation (at this moment) seems to be pointing to some old Kubernetes API v1.13 documentation, so I've no idea why this needs to be written in the plural "jobs" but this solved the issue for me.
In the example trigger, where a Pod is triggered, the value "pods" is used the same field which pointed me in the right direction.

GitLab CI/CD, Kubernetes and, Private Volumes

I have an application that uses a database. I want to set up a GitLab CI/CD pipeline to deploy my app to a Kubernetes cluster. My issue right now is that I can't seem to get persistent storage to work. My thought processes are as follows:
Create a persistent Volume -> Create a persistent Volume Claim -> Mount that PVC to my pod running a database
I am running into the issue that a PV is a system-wide configuration, so GitLab can't seem to create one. If I manage to make A PV before deployment GitLab only allows me to work with objects within a specific namespace. This means the PVC won't see the PV I created when my pipeline is run.
manifest.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-sql0001
labels:
type: amazoneEBS
spec:
capacity:
storage: 15Gi
accessModes:
- ReadWriteOnce
awsElasticBlockStore:
volumeID: <volume ID>
fsType: ext4
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: sql-pvc
labels:
type: amazoneEBS
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 15Gi
selector:
matchLabels:
type: "amazoneEBS"
kubectl Error
kubectl apply -f manifest.yaml
persistentvolumeclaim/sql-pvc created
Error from server (Forbidden): error when retrieving current configuration of:
Resource: "/v1, Resource=persistentvolumes", GroupVersionKind: "/v1, Kind=PersistentVolume"
Name: "pv-sql0001", Namespace: ""
from server for: "manifest.yaml": persistentvolumes "pv-sql0001" is forbidden: User "system:serviceaccount:namespace:namespace-service-account" cannot get resource "persistentvolumes" in API group "" at the cluster scope
I tried what was recommended in #Rakesh Gupta post but I am still getting the same error. Unless I am misunderstanding.
eddy#DESKTOP-1MHAKBA:~$ kubectl describe ClusterRole stateful-site-26554211-CR --namespace=stateful-site-26554211-pr
Name: stateful-site-26554211-CR
Labels: <none>
Annotations: <none>
PolicyRule:
Resources Non-Resource URLs Resource Names Verbs
--------- ----------------- -------------- -----
namespaces [] [] [list watch create]
nodes [] [] [list watch create]
persistentvolumes [] [] [list watch create]
storageclasses.storage.k8s.io [] [] [list watch create]
eddy#DESKTOP-1MHAKBA:~$ kubectl describe ClusterRoleBinding stateful-site-26554211-CRB --namespace=stateful-site-26554211-production
Name: stateful-site-26554211-CRB
Labels: <none>
Annotations: <none>
Role:
Kind: ClusterRole
Name: stateful-site-26554211-CR
Subjects:
Kind Name Namespace
---- ---- ---------
ServiceAccount stateful-site-26554211-production-service-account stateful-site-26554211-production
Any insight into how I should do this would be appreciated. I might just be doing this all wrong, and maybe there is a better way. I will be around to answer any questions.
You need to create a ServiceAccount, ClusterRole and ClusterRoleBinding as PV, PVC, Nodes are cluster scoped objects.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: <name of your cluster role>
rules:
- apiGroups: [""]
resources:
- nodes
- persistentvolumes
- namespaces
verbs: ["list", "watch", "create"]
- apiGroups: ["storage.k8s.io"]
resources:
- storageclasses
verbs: ["list", "watch", "create"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: <name of your cluster role binding>
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: <name of your cluster role which should be matched with the previous one>
subjects:
- kind: ServiceAccount
name: <service account name>
Reference: https://stackoverflow.com/a/60617584/2777988
If this does not work, you may try to remove "PersistentVolume" section from your yaml. Looks like your setup doesn't allow PersistentVolume creation. Howerver, PVC may in turn create a PV.

Setting up a Kubernetes Cronjob: Cannot get resource "horizontalpodautoscalers" in API group "autoscaling" in the namespace "staging"

I've tried to setup a cron job to patch a horizontal pod autoscaler, but the job returns horizontalpodautoscalers.autoscaling "my-web-service" is forbidden: User "system:serviceaccount:staging:sa-cron-runner" cannot get resource "horizontalpodautoscalers" in API group "autoscaling" in the namespace "staging"
I've tried setting up all the role permissions as below:
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
namespace: staging
name: cron-runner
rules:
- apiGroups: ["autoscaling"]
resources: ["horizontalpodautoscalers"]
verbs: ["patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: cron-runner
namespace: staging
subjects:
- kind: ServiceAccount
name: sa-cron-runner
namespace: staging
roleRef:
kind: Role
name: cron-runner
apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: sa-cron-runner
namespace: staging
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: scale-up-job
namespace: staging
spec:
schedule: "16 20 * * 1-6"
concurrencyPolicy: Forbid
jobTemplate:
spec:
template:
spec:
serviceAccountName: sa-cron-runner
containers:
- name: scale-up-job
image: bitnami/kubectl:latest
command:
- /bin/sh
- -c
- kubectl patch hpa my-web-service --patch '{\"spec\":{\"minReplicas\":6}}'
restartPolicy: OnFailure
kubectl auth can-i patch horizontalpodautoscalers -n staging --as sa-cron-runner also returns no, so the permissions can't be setup correctly.
How can I debug this? I can't see how this is setup incorrectly
I think that the problem is that the cronjob needs to first get the resource and then Patch the same. So, you need to explicitly specify the permission to get in the Role spec.
The error also mentions authentication problem with getting the resource:
horizontalpodautoscalers.autoscaling "my-web-service" is forbidden: User "system:serviceaccount:staging:sa-cron-runner" cannot get resource "horizontalpodautoscalers" in API group "autoscaling" in the namespace "staging"
Try modifying the verbs part of the Role as:
verbs: ["patch", "get"]
You could also include list and watch depending on the requirements in the scripts that is run inside the cronjob.

Kubernetes cluster role with permissions to watch events

I'm trying to create a cluster role with permissions to watch events, but it seems that I'm missing something.
I'm using the following:
apiVersion: v1
kind: ServiceAccount
metadata:
name: watch-events
namespace: test
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: watch-events-cluster
rules:
- apiGroups:
- ""
resources:
- events
verbs:
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: watch-events-cluster
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: watch-events-cluster
subjects:
- kind: ServiceAccount
name: watch-events
namespace: test
No mater what I try with kubectl auth can-i watch events --as watch-events I always get a no.
Am I missing something?
The RBAC is correct and will give cluster wide permission to watch events across all namespaces but the kubectl command is incorrect.The command should be
kubectl auth can-i watch events --as=system:serviceaccount:test:watch-events
If you are making api calls against the swagger api for Kubernetes, you need to specify the Events api group properly with the suffix .k8s.io
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#-strong-api-groups-strong-
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: my-custom-role
namespace: default
rules:
- apiGroups:
- ''
- events.k8s.io
resources:
- events
verbs:
- '*'
---
https://kubernetes.io/docs/reference/access-authn-authz/rbac/#service-account-permissions
Default RBAC policies grant scoped permissions to control-plane components, nodes, and controllers, but grant no permissions to service accounts outside the kube-system namespace (beyond discovery permissions given to all authenticated users).

Receiving error when calling Kubernetes API from the Pod

I have .NET Standard (4.7.2) simple application that is containerized. It has a method to list all namespaces in a cluster. I used csharp kubernetes client to interact with the API. According to official documentation the default credential of API server are created in a pod and used to communicate with API server, but while calling kubernetes API from the pod, getting following error:
Operation returned an invalid status code 'Forbidden'
My deployment yaml is very minimal:
apiVersion: v1
kind: Pod
metadata:
name: cmd-dotnetstdk8stest
spec:
nodeSelector:
kubernetes.io/os: windows
containers:
- name: cmd-dotnetstdk8stest
image: eddyuk/dotnetstdk8stest:1.0.8-cmd
ports:
- containerPort: 80
I think you have RBAC activatet inside your Cluster. You need to assign a ServiceAccount to your pod which containing a Role, that allows this ServerAccount to get a list of all Namespaces. When no ServiceAccount is specified in the Pod-Template, the namespaces default ServiceAccount will be assigned to the pods running in this namespace.
First, you should create the Role
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: <YOUR NAMESPACE>
name: namespace-reader
rules:
- apiGroups: [""] # "" indicates the core API group
resources: ["namespaces"] # Resource is namespaces
verbs: ["get", "list"] # Allowing this roll to get and list namespaces
Create a new ServiceAccount inside your Namespace
apiVersion: v1
kind: ServiceAccount
metadata:
name: application-sa
namespace: <YOUR-NAMESPACE>
Assign your Role created Role to the Service-Account:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: allow-namespace-listing
namespace: <YOUR-NAMESPACE>
subjects:
- kind: ServiceAccount
name: application-sa # Your newly created Service-Account
namespace: <YOUR-NAMESPACE>
roleRef:
kind: Role
name: namespace-reader # Your newly created Role
apiGroup: rbac.authorization.k8s.io
Assign the new Role to your Pod by adding a ServiceAccount to your Pod Spec:
apiVersion: v1
kind: Pod
metadata:
name: podname
namespace: <YOUR-NAMESPACE>
spec:
serviceAccountName: application-sa
You can read more about RBAC in the official docs. Maybe you want to use kubectl-Commands instead of YAML definitions.