I have installed prometheus into my Kubernetes v1.17 KOPS cluster following kube-prometheus, ensuring the --authentication-token-webhook=true and --authorization-mode=Webhook prerequisets are set and the kube-prometheus/kube-prometheus-kops.libsonnet configuration specified.
I have then installed Postgres using https://github.com/helm/charts/tree/master/stable/postgresql using the supplied values-production.yaml with the following set:
metrics:
enabled: true
# resources: {}
service:
type: ClusterIP
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9187"
loadBalancerIP:
serviceMonitor:
enabled: true
namespace: monitoring
interval: 30s
scrapeTimeout: 10s
Both services are up and working, but prometheus doesn't discover any metrics from Postgres.
The logs on the metrics container on my postgres pods have no errors, and neither do any of the pods in the monitoring namespace.
What additional steps are required to have the Postgres metrics exporter reach Prometheus?
Try to update ClusterRole for Prometheus. By default, it hasn't permissions to retrieve a list of pods, services, and endpoints from non-monitoring namespace.
In my system the original ClusterRole was:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-k8s
rules:
- apiGroups:
- ""
resources:
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
I've changed it to:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-k8s
rules:
- apiGroups:
- ""
resources:
- nodes/metrics
- services
- endpoints
- pods
verbs:
- get
- list
- watch
- nonResourceURLs:
- /metrics
verbs:
- get
After those changes, Postgres metrics will be available for Prometheus.
Related
I noticed that a new cluster role - "eks:cloud-controller-manager" appeared in our EKS cluster. we never created it.I tried to find origin/creation of this cluster role but not able to find it.
any idea what does "eks:cloud-controller-manager" cluster role does in EKS cluster?
$ kubectl get clusterrole eks:cloud-controller-manager -o yaml
kind: ClusterRole
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRole","metadata":{"annotations":{},"name":"eks:cloud-controller-manager"},"rules":[{"apiGroups":[""],"resources":["events"],"verbs":["create","patch","update"]},{"apiGroups":[""],"resources":["nodes"],"verbs":["*"]},{"apiGroups":[""],"resources":["nodes/status"],"verbs":["patch"]},{"apiGroups":[""],"resources":["services"],"verbs":["list","patch","update","watch"]},{"apiGroups":[""],"resources":["services/status"],"verbs":["list","patch","update","watch"]},{"apiGroups":[""],"resources":["serviceaccounts"],"verbs":["create","get"]},{"apiGroups":[""],"resources":["persistentvolumes"],"verbs":["get","list","update","watch"]},{"apiGroups":[""],"resources":["endpoints"],"verbs":["create","get","list","watch","update"]},{"apiGroups":["coordination.k8s.io"],"resources":["leases"],"verbs":["create","get","list","watch","update"]},{"apiGroups":[""],"resources":["serviceaccounts/token"],"verbs":["create"]}]}
creationTimestamp: "2022-08-02T00:25:52Z"
name: eks:cloud-controller-manager
resourceVersion: "762242250"
uid: 34e568bb-20b5-4c33-8a7b-fcd081ae0a28
rules:
- apiGroups:
- ""
resources:
- events
verbs:
- create
- patch
- update
- apiGroups:
- ""
resources:
- nodes
verbs:
- '*'
- apiGroups:
- ""
resources:
- serviceaccounts/token
verbs:
- create```
I tried to find this object in our Gitops repo but do not find it.
This role is created by AWS when you provision the cluster. This role is for the AWS cloud-controller-manager to integrate AWS services (eg. CLB/NLB, EBS) with Kubernetes. You will also find other roles like eks:fargate-manager to integrate with Fargate.
I have a Kubernetes namespace with limited privileges which excludes the creation of ClusterRole and ClusterRoleBinding.
I want to monitor the resource consumption and pod-related metrics on the namespace level.
E.g., pod health and status, new pod creation, pod restarts, etc.
Although I can create an application-level metrics endpoint for custom metrics by exposing /metrics and adding the annotation prometheus.io/scrape: 'true'.
But is there a way to get resource consumption and pod-related metrics on the namespace level without Cluster Role and ClusterRoleBinding?
It is possible to get namespace level entities from kube-state-metrics.
Pull the helm chart for kube-state-metrics:
https://bitnami.com/stack/kube-state-metrics/helm
Edit the values.yaml file and make the following changes:
rbac:
create: false
useClusterRole: false
collectors:
- configmaps
- cronjobs
- daemonsets
- deployments
- endpoints
- horizontalpodautoscalers
- ingresses
- jobs
- limitranges
- networkpolicies
- poddisruptionbudgets
- pods
- replicasets
- resourcequotas
- services
- statefulsets
namespace: <current-namespace>
In the prometheus ConfigMap, add a job with the following configurations:
- job_name: 'kube-state-metrics'
scrape_interval: 1s
scrape_timeout: 500ms
static_configs:
- targets: ['{{ .Values.kube_state_metrics.service.name }}:8080']
Create a role binding:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: kube-state-metrics
namespace: <current-namespace>
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: view
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: <current-namespace>
I am unable to identify what the exact issue with the permissions with my setup as shown below. I've looked into all the similar QAs but still unable to solve the issue. The aim is to deploy Prometheus and let it scrape /metrics endpoints that my other applications in the cluster expose fine.
Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:default:default\" cannot list resource \"endpoints\" in API group \"\" at the cluster scope"
Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:default:default\" cannot list resource \"pods\" in API group \"\" at the cluster scope"
Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:default:default\" cannot list resource \"services\" in API group \"\" at the cluster scope"
...
...
The command below returns no to all services, nodes, pods etc.
kubectl auth can-i get services --as=system:serviceaccount:default:default -n default
Minikube
$ minikube start --vm-driver=virtualbox --extra-config=apiserver.Authorization.Mode=RBAC
😄 minikube v1.14.2 on Darwin 11.2
✨ Using the virtualbox driver based on existing profile
👍 Starting control plane node minikube in cluster minikube
🔄 Restarting existing virtualbox VM for "minikube" ...
🐳 Preparing Kubernetes v1.19.2 on Docker 19.03.12 ...
▪ apiserver.Authorization.Mode=RBAC
🔎 Verifying Kubernetes components...
🌟 Enabled addons: storage-provisioner, default-storageclass, dashboard
🏄 Done! kubectl is now configured to use "minikube" by default
Roles
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: monitoring-cluster-role
rules:
- apiGroups: [""]
resources: ["nodes", "services", "pods", "endpoints"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get"]
- apiGroups: ["extensions"]
resources: ["deployments"]
verbs: ["get", "list", "watch"]
apiVersion: v1
kind: ServiceAccount
metadata:
name: monitoring-service-account
namespace: default
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: monitoring-cluster-role-binding
roleRef:
kind: ClusterRole
name: monitoring-cluster-role
apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
name: monitoring-service-account
namespace: default
Prometheus
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config-map
namespace: default
data:
prometheus.yml: |
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-deployment
namespace: default
labels:
app: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
containers:
- name: prometheus
image: prom/prometheus:latest
ports:
- name: http
protocol: TCP
containerPort: 9090
volumeMounts:
- name: config
mountPath: /etc/prometheus/
- name: storage
mountPath: /prometheus/
volumes:
- name: config
configMap:
name: prometheus-config-map
- name: storage
emptyDir: {}
apiVersion: v1
kind: Service
metadata:
name: prometheus-service
namespace: default
spec:
type: NodePort
selector:
app: prometheus
ports:
- name: http
protocol: TCP
port: 80
targetPort: 9090
User "system:serviceaccount:default:default" cannot list resource "endpoints" in API group "" at the cluster scope"
User "system:serviceaccount:default:default" cannot list resource "pods" in API group "" at the cluster scope"
User "system:serviceaccount:default:default" cannot list resource "services" in API group "" at the cluster scope"
Something running with ServiceAccount default in namespace default is doing things it does not have permissions for.
apiVersion: v1
kind: ServiceAccount
metadata:
name: monitoring-service-account
Here you create a specific ServiceAccount. You also give it some Cluster-wide permissions.
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-deployment
namespace: default
You run Prometheus in namespace default but do not specify a specific ServiceAccount, so it will run with ServiceAccount default.
I think your problem is that you are supposed to set the ServiceAccount that you create in the Deployment-manifest for Prometheus.
I'm new to k8s, prometheus. I'm trying to collect the metrics of each pods with prometheus but unable to so because of the error:
API ERROR.
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "forbidden: User \"system:anonymous\" cannot get path \"/metrics\"",
"reason": "Forbidden",
"details": {
},
"code": 403
}
system:anonymous means that an unauthenticated user is trying to get a resource from your cluster, which is forbidden. You will need to create a service account, then give that service account some permissions through RBAC, then make that service account to get the metrics. All that is documented.
As a workaround, you can do this:
kubectl create clusterrolebinding prometheus-admin --clusterrole cluster-admin --user system:anonymous
Now, note that this is a terrible idea, unless you are playing with kubernetes. With this permission you are giving any unauthenticated user total permissions into your cluster.
Create the following manifests:
ServiceAccount.yaml:
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
name: kube-state-metrics
namespace: grafana
ClusterRole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
name: kube-state-metrics
rules:
- apiGroups:
- ""
resources:
- configmaps
- secrets
- nodes
- pods
- services
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
verbs:
- list
- watch
- apiGroups:
- extensions
resources:
- daemonsets
- deployments
- replicasets
- ingresses
verbs:
- list
- watch
- apiGroups:
- apps
resources:
- statefulsets
- daemonsets
- deployments
- replicasets
verbs:
- list
- watch
- apiGroups:
- batch
resources:
- cronjobs
- jobs
verbs:
- list
- watch
- apiGroups:
- autoscaling
resources:
- horizontalpodautoscalers
verbs:
- list
- watch
- apiGroups:
- authentication.k8s.io
resources:
- tokenreviews
verbs:
- create
- apiGroups:
- authorization.k8s.io
resources:
- subjectaccessreviews
verbs:
- create
- apiGroups:
- policy
resources:
- poddisruptionbudgets
verbs:
- list
- watch
- apiGroups:
- certificates.k8s.io
resources:
- certificatesigningrequests
verbs:
- list
- watch
- apiGroups:
- storage.k8s.io
resources:
- storageclasses
verbs:
- list
- watch
- nonResourceURLs:
- "/metrics"
verbs:
- get
ClusterRoleBinding.yaml:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
name: kube-state-metrics
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kube-state-metrics
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: grafana
And inform your Kube-State-Metrics deployment to use the new ServiceAccount with the following addition to your Template Spec: serviceAccountName: kube-state-metrics.
I upgrade my GKE API server to 1.6, and am in the process of upgrading nodes to 1.6, but ran into a snag...
I've got a prometheus server (version 1.5.2) running in a pod managed by a Kubernetes deployment with a couple of nodes running version 1.5.4 Kubelet, with a single new node running 1.6.
Prometheus can't connect to the new node--it's metrics endpoint is returning 401 Unauthorized.
This seems to be a RBAC issue, but I'm not sure how to proceed. I can't find docs on what roles the Prometheus server needs, or even how to grant them to the server.
From the coreos/prometheus-operator repo I was able to piece together a configuration that I might expect to work:
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- configmaps
verbs: ["get"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: default
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: default
secrets:
- name: prometheus-token-xxxxx
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
app: prometheus-prometheus
component: server
release: prometheus
name: prometheus-server
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: prometheus-prometheus
component: server
release: prometheus
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
labels:
app: prometheus-prometheus
component: server
release: prometheus
spec:
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
serviceAccount: prometheus
serviceAccountName: prometheus
...
But Prometheus is still getting 401s.
UPDATE: seems like a kubernetes authentication issue as Jordan said. See new, more focused question here; https://serverfault.com/questions/843751/kubernetes-node-metrics-endpoint-returns-401
401 means unauthenticated, which means it is not an RBAC issue. I believe GKE no longer allows anonymous access to the kubelet in 1.6. What credentials are you using to authenticate to the kubelet?
This is what I have working for role definition and binding.
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: default
As per discussion on #JorritSalverda's ticket; https://github.com/prometheus/prometheus/issues/2606#issuecomment-294869099
Since GKE doesn't allow you to get to client certificates that would allow you to authenticate yourself with the kubelet, the best solution for users on GKE seems to use the kubernetes API server as a proxy requests to nodes.
To do this (quoting #JorritSalverda);
"For my Prometheus server running inside GKE I now have it running with the following relabeling:
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc.cluster.local:443
- target_label: __scheme__
replacement: https
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
And the following ClusterRole bound to the service account used by Prometheus:
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/proxy
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
Because the GKE cluster still has an ABAC fallback in case RBAC fails I'm not 100% sure yet this covers all required permissions.