Monitoring Kubernetes cluster using prometheus outside the k8 cluster - kubernetes

We have kubernetes cluster where I have service account "kube", namespace "monitoring" with cluster role binding created to monitor cluster
We have prometheus installed on a linux system (on prem) outside the cluster and is installed using "root"
When I try to connect to the k8 cluster with the https api using ca.crt and user token (given by kubernetes admin), it throws multiple errors.
Error messages:
component="discovery manager scrape" msg="Cannot create service discovery" err="unable to use specified CA cert /root/prometheus/ca.crt" type=*kubernetes.SDConfig
component="discovery manager scrape" msg="Cannot create service discovery" err="unable to use specified CA cert /root/prometheus/ca.crt" type=*kubernetes.SDConfig
Prometheus configuration:
- job_name: 'kubernetes-apiservers'
scheme: https
tls_config:
ca_file: /root/prometheus/ca.crt
bearer_token_file: /root/prometheus/user_token
kubernetes_sd_configs:
- role: endpoints
api_server: https://example.com:1234
bearer_token_file: /root/prometheus/user_token
tls_config:
ca_file: /root/prometheus/prometheus-2.12.0.linux-amd64/ca.crt
relabel_configs:
- source_labels: [monitoring, monitoring-sa, 6443]
action: keep
regex: default;kubernetes;https
- job_name: 'kubernetes-nodes'
scheme: https
tls_config:
ca_file: /root/prometheus/ca.crt
bearer_token_file: /root/prometheus/user_token
kubernetes_sd_configs:
- role: node
api_server: https://example.com:1234
bearer_token_file: /root/prometheus/user_token
tls_config:
ca_file: /root/prometheus/ca.crt
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: https://example.com:1234
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics

The main problem you're facing is: "unable to use specified CA cert /root/prometheus/ca.crt"
Someone recently faced the same problem:
https://github.com/prometheus/prometheus/issues/6015#issuecomment-532058465
He solved it by reinstalling the new version.
Version 2.13.1 is out. Try installing the latest version, it might solve your problem too.

Your ca.crt is most probably still in base64 format since secrets are encoded that way when describing them, as explained here.

Maybe your ca.crt have some error, check your ca cert file, make sure this file format like this:
-----BEGIN CERTIFICATE-----
xxxxx
-----END CERTIFICATE-----
I think your ca.crt is get by kubectl get serviceaccount -o yaml, but this is a public key with your kubernetes cluster, so, if you want to get the token, you can specify the serviceAccountName in the yaml file with a new Deployment, like this:
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
name: test
spec:
replicas: 1
template:
metadata:
labels:
app: test
version: v1
spec:
serviceAccountName: prometheus
containers:
- name: test
image: alpine
imagePullPolicy: Always
command: ["ping", "127.0.0.1"]
imagePullSecrets:
- name: harbor-secret
restartPolicy: Always
Then, get your token and ca.crt under /var/run/secrets/kubernetes.io/serviceaccount/.

Related

Prometheus configuration for monitoring Orleans in Kubernetes

I'm trying to get Prometheus functioning with my Orleans silos...
I use this consumer to expose Orleans metrics for Prometheus on port 8082. With a local Prometheus instance and using the grafana.json from the same repository I see that it works.
_ = builder.AddPrometheusTelemetryConsumerWithSelfServer(port: 8082);
Following this guide to install Prometheus on Kubernetes on a different namespace that my silos are deployed.
Following instructions I added the prometheus labels to my orleans deployment yaml:
spec:
replicas: 2
selector:
matchLabels:
app: mysilo
template:
metadata:
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '8082'
labels:
app: mysilo
My job in prometheus yml:
- job_name: "orleans"
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- orleans
selectors:
- role: "pod"
label: "app=mysilo"
According to the same guide, all the pods metrics get discovered if "the pod metadata is annotated with prometheus.io/scrape and prometheus.io/port annotations.". I assume I don't need any extra installations.
With all this, and port forwarding my prometheus pod, I can see prometheus is working in http://localhost:9090/metrics but no metrics are being shown in my grafana dashboard (again, I could make it work in local machine with only one silo).
When exploring grafana I find that it seems it can't find the instances:
sum(rate(process_cpu_seconds_total{job=~"orleans", instance=~"()"}[3m])) * 100
The aim is to monitor resources my orleans silos are using (not the pods metrics themselves, but orleans metrics), but I'm missing something :(
Thanks to #BozoJoe's comment I could debug this.
The problem was that it was trying to scrape ports 30000 and 1111 instead of 8082 like I said before. I could see this thanks to the Prometheus dashboard at localhost:9090/targets
So I went to prometheus config file and make sure to start scrapping the correct port (also I added some restrictions to the search for name):
- job_name: "orleans"
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- orleans
selectors:
- role: "pod"
label: "app=mysilo"
relabel_configs:
- source_labels: [__meta_kubernetes_pod_container_name]
action: keep
regex: 'my-silo-name*'
- source_labels: [__address__]
action: replace
regex: ([^:]+):.*
replacement: $1:8081
target_label: __address__

Prometheus can't find api services running in Kubernetes

I have deployed Prometheus to a running kubernetes cluster.
I want it to use service discovery to find all our services and scrape their health endpoint to collect their metrics.
The problem is that even though it finds the services using the scrape config kubernetes-apiservers it says all services are down.
I don't need an absolute solution to the problem even suggestions on how I might debug this would be very helpful.
Within my config I have
scrape_configs:
- job_name: kubernetes-apiservers
honor_timestamps: true
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: https
authorization:
type: Bearer
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
follow_redirects: true
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
separator: ;
regex: default;kubernetes;https
replacement: $1
action: keep
- separator: ;
regex: (.*)
target_label: job
replacement: apiserver
action: replace
when I look on the service discovery tab it says that only 1 of 43 kubernetes-apiservers are available.
On the page status -> targets everything seems to be up.
Any guess as to where I can see why the 42 other services aren't being scraped.

Prometheus kuberentes-pods Get "https:// xx.xx.xx:443 /metrics": dial tcp xx.xx.xx:443: connect: connection refused

I have configured Prometheus on one of the kubernetes cluster nodes using [this][1]. After that I added following prometheus.yml file. I can list nodes and apiservers but for pods, all the pods shows down and error:
Get "https:// xx.xx.xx:443 /metrics": dial tcp xx.xx.xx:443: connect: connection refused and for some pods the status is unknown.
Can someone point me what am I doing wrong here?
Cat prometheus.yml
global:
scrape_interval: 1m
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: \['localhost:9090'\]
# metrics for default/kubernetes api's from the kubernetes master
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
bearer_token_file: /dfgdjk/token
api_server: https://masterapi.com:3343
tls_config:
insecure_skip_verify: true
tls_config:
insecure_skip_verify: true
bearer_token_file: /dfgdjk/token
scheme: https
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: \[__meta_kubernetes_namespace\]
action: replace
target_label: kubernetes_namespace
- source_labels: \[__meta_kubernetes_pod_name\]
action: replace
target_label: kubernetes_pod_name
# metrics for default/kubernetes api's from the kubernetes master
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
api_server: https://masterapi.com:3343
bearer_token_file: /dfgdjk/token
tls_config:
insecure_skip_verify: true
tls_config:
insecure_skip_verify: true
bearer_token_file: /dfgdjk/token
scheme: https
relabel_configs:
- source_labels: \[__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name\]
action: keep
regex: default;kubernetes;https][1]
[1]: https://devopscube.com/install-configure-prometheus-linux/
It's impossible to get metrics to external prometheus server without having any prometheus components inside the kubernetes cluster. This happens because cluster network is isolated with host's network and it's not possible to scrape metrics from pods directly from outside the cluster.
Please refer to Monitoring kubernetes with prometheus from outside of k8s cluster GitHub issue
There options which can be done:
install prometheus inside the cluster using prometheus operator or manually - example
use proxy solutions, for example this one from the same thread on GitHub - k8s-prometheus-proxy
on top of the prometheus installed within the cluster, it's possible to have external prometheus in federation so all logs are saved outside of the cluster. Please refer to prometheus federation.
Also important part is kube state metrics should be installed as well in kubernetes cluster. How to set it up.
Edit: also you can refer to another SO question/answer which confirms that only with additional steps or OP resolved it by another proxy solution.

Prometheus targets: server returned HTTP status 403 Forbidden

I have setup prometheus, running in my kubernetes cluster , And I configured the certificate of kubernetes in the configuration file of Prometheus, but for some targets I am getting back a "server returned HTTP status 403 Forbidden". this is part of my config:
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /etc/k8spem/ca.pem
cert_file: /etc/k8spem/admin.pem
key_file: /etc/k8spem/admin.key
bearer_token_file: /etc/k8spem//token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
I have already configured the certificate, why still 403 ?
by the way, I can get results on CLI by executing this command curl -k --cacert /work/deploy/kubernetes/security/ca.pem --cert /work/deploy/kubernetes/security/admin.pem --key /work/deploy/kubernetes/security/admin.key --cert-type PEM https://172.16.5.150:6443/metrics
I don't know why, I just mount a new directory, delete the old configMap and recreate it. And it' work. I think maybe i just forgot to reapply the configMap.

Prometheus Outside Kubernetes Cluster

I'm trying to configure Prometheus outside Kubernetes Cluster.
Below is my Prometheus config.
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
api_server: https://10.0.4.155:6443
scheme: https
tls_config:
insecure_skip_verify: true
basic_auth:
username: kube
password: Superkube01
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
These is how it looks:
root#master01:~# kubectl cluster-info
Kubernetes master is running at https://10.0.4.155:6443
root#master01:~# kubectl get endpoints
NAME ENDPOINTS AGE
kubernetes 10.0.4.103:6443,10.0.4.138:6443,10.0.4.155:6443 11h
netchecker-service 10.2.0.10:8081 11h
root#master01:~#
But, when starting Prometheus, i'm getting below error.
level=error ts=2018-05-29T13:55:08.171451623Z caller=main.go:216 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:270: Failed to list *v1.Pod: Get https://10.0.4.155:6443/api/v1/pods?resourceVersion=0: x509: certificate signed by unknown authority"
Could anyone please tell me, what wrong i'm doing here?
Thanks,
Pavanasam R
The error indicates that Prometheus is using a different certificate to sign its metric collection request than the one expected by your apiserver.
You really need to format your code in a code block so we can see the yaml formatting. kubernetes_sd_configs seems to be the wrong home for insecure_skip_verify and basic_auth according to this link. Might want to move them and try scraping again.
As of now your insecure_skip_verify is a part of kubernetes_sd_configs:. Add it in api_server context as well.
kubernetes_sd_configs:
- api_server: https://<ip>:6443
role: node
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
insecure_skip_verify: true
in order to access kubernetes api endpoint you need to authenticate the client either through basic_auth, bearer_token, tls_config. please go through this , it will be helpful.