Scrape CoreDNS metrics with External Prometheus - kubernetes

I have a kubernetes cluster(built with Typhoon module) and a Prometheus instance in different VPC(running on docker-compose, not on Kubernetes cluster). I have the vpc peering connection enabled and required ports are open to this vpc. All the metrics are being scraped as expected except for coredns pod. The issue here is coredns pods are assigned with 10.2.. IP which is different from my IP range configured for the pods to run.
If coredns pod gets the IP 172...*, my prometheus will be able to resolve it and the scraping will be successful.
Now, I'm not sure how to scrape this metrics. Please let me know if you know what am I doing wrong.
$ kubectl get pods -n kube-system -o wide | grep coredns
coredns-7d8995c4cd-4l4ft 1/1 Running 1 7d1h 10.2.5.2 ip-172-*-*-* <none> <none>
coredns-7d8995c4cd-vxd9d 1/1 Running 1 6d3h 10.2.3.9 ip-172-*-*-* <none> <none>
Prometheus.yml file is configured with the below job.
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
api_server: https://kubernetes-cluster:6443
tls_config:
insecure_skip_verify: true
bearer_token: "TOKEN"
bearer_token: "TOKEN"
honor_labels: true
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: job
metric_relabel_configs:
- source_labels: [__name__]
action: drop
regex: etcd_(debugging|disk|request|server).*
P.S: I'm using Flannel as my network CNI so that I get the pods created with the IP of the host network itself.
Updated Info:
I tried deploying the prometheus on kubernetes and trying to federate this data to my prometheus docker as suggested by Yaron.
I'm trying the below config for the federation but not seeing any metrics loaded to my target prometheus.
- job_name: 'federate'
scrape_interval: 10s
honor_labels: true
metrics_path: '/federate'
params:
'match[]':
- '{job="prometheus"}'
- '{job="kubernetes-nodes"}'
- '{job="kubernetes-apiservers"}'
- '{job="kubernetes-service-endpoints"}'
- '{job="kubernetes-cadvisor"}'
- '{job="kubelet"}'
- '{job="etcd"}'
- '{job="kubernetes-services"}'
- '{job="kubernetes-pods"}'
scheme: https
static_configs:
- targets:
- prom.mycompany.com

The best practice for solving this issue is running a prometheus instance inside the cluster running Coredns, and federating the metrics scraped by that prometheus into your external prometheus running with docker-compose.
You can read more about federation here, to get an idea of how to start leveraging it.
A more advanced use case would be using Thanos to better distribute queries across your different prometheus servers, but the main point remains running an internal prometheus server within each of your clusters.

Related

How to change port number using relabel configs

Is there a way to specify a specific port using Prometheus relabel_configIs?
I've deployed a Prometheus helm chart to my Kubernetes cluster, and it works fine, except for a small issue.
kubernetes-service-endpoints (6/8 up)
Looking at the endpoints that are down, I have narrowed the issue to this block of my Prometheus configuration.
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
It looks like if the prometheus.io/port annotation is declared in the service definition, then the defined port is used to replace the port in __address__
My cluster is deployed using Kops and the kube-dns service appears to have these annotations out of the box
prometheus.io/port: "9153"
prometheus.io/scrape: "true"
But on the backend pods, it's prometheus.io/port: 10055
And because of this particular relabel_configs definition block, port 10055 is being replaced with 9153, and I get the error
Get "http://100.119.59.4:9153/metrics": dial tcp 100.119.59.4:9153: connect: connection refused
Is there a way to get Prometheus to use port 10055 instesd of 9153?

Prometheus can't find api services running in Kubernetes

I have deployed Prometheus to a running kubernetes cluster.
I want it to use service discovery to find all our services and scrape their health endpoint to collect their metrics.
The problem is that even though it finds the services using the scrape config kubernetes-apiservers it says all services are down.
I don't need an absolute solution to the problem even suggestions on how I might debug this would be very helpful.
Within my config I have
scrape_configs:
- job_name: kubernetes-apiservers
honor_timestamps: true
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: https
authorization:
type: Bearer
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
follow_redirects: true
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
separator: ;
regex: default;kubernetes;https
replacement: $1
action: keep
- separator: ;
regex: (.*)
target_label: job
replacement: apiserver
action: replace
when I look on the service discovery tab it says that only 1 of 43 kubernetes-apiservers are available.
On the page status -> targets everything seems to be up.
Any guess as to where I can see why the 42 other services aren't being scraped.

Prometheus kuberentes-pods Get "https:// xx.xx.xx:443 /metrics": dial tcp xx.xx.xx:443: connect: connection refused

I have configured Prometheus on one of the kubernetes cluster nodes using [this][1]. After that I added following prometheus.yml file. I can list nodes and apiservers but for pods, all the pods shows down and error:
Get "https:// xx.xx.xx:443 /metrics": dial tcp xx.xx.xx:443: connect: connection refused and for some pods the status is unknown.
Can someone point me what am I doing wrong here?
Cat prometheus.yml
global:
scrape_interval: 1m
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: \['localhost:9090'\]
# metrics for default/kubernetes api's from the kubernetes master
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
bearer_token_file: /dfgdjk/token
api_server: https://masterapi.com:3343
tls_config:
insecure_skip_verify: true
tls_config:
insecure_skip_verify: true
bearer_token_file: /dfgdjk/token
scheme: https
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: \[__meta_kubernetes_namespace\]
action: replace
target_label: kubernetes_namespace
- source_labels: \[__meta_kubernetes_pod_name\]
action: replace
target_label: kubernetes_pod_name
# metrics for default/kubernetes api's from the kubernetes master
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
api_server: https://masterapi.com:3343
bearer_token_file: /dfgdjk/token
tls_config:
insecure_skip_verify: true
tls_config:
insecure_skip_verify: true
bearer_token_file: /dfgdjk/token
scheme: https
relabel_configs:
- source_labels: \[__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name\]
action: keep
regex: default;kubernetes;https][1]
[1]: https://devopscube.com/install-configure-prometheus-linux/
It's impossible to get metrics to external prometheus server without having any prometheus components inside the kubernetes cluster. This happens because cluster network is isolated with host's network and it's not possible to scrape metrics from pods directly from outside the cluster.
Please refer to Monitoring kubernetes with prometheus from outside of k8s cluster GitHub issue
There options which can be done:
install prometheus inside the cluster using prometheus operator or manually - example
use proxy solutions, for example this one from the same thread on GitHub - k8s-prometheus-proxy
on top of the prometheus installed within the cluster, it's possible to have external prometheus in federation so all logs are saved outside of the cluster. Please refer to prometheus federation.
Also important part is kube state metrics should be installed as well in kubernetes cluster. How to set it up.
Edit: also you can refer to another SO question/answer which confirms that only with additional steps or OP resolved it by another proxy solution.

Is there a way to make a kubernetes service name set as Prometheus Job name automatically?

How can I make my Kubernetes service name set as the Prometheus Job name automatically? I mean to say that is there a possible way to get a new service created in K8s made automatically as a target in Prometheus configuration? In Kubernetes, I will like to deploy my application as set of services.
For every service there could be more than 1 pod associated.
MApping could be done like:
Kubernetes services to Prometheus Jobs
K8s Pods to instances in Prometheus Job
But I really don't know if this is feasible with some Configuration changes in Prometheus. Please correct me if I am wrong anywhere.
If this is not possible, do I need to write create explicitly Prometheus job in the Prometheus Configuration file every time before deployment.
You will typically want metrics per pod, as you would normally have when using regular nodes instead of containers/pods.
Using this Prometheus configuration you will get a target for every pod that's running on the cluster automatically. This is the important part
# Example scrape config for pods
#
# The relabeling allows the actual pod scrape endpoint to be configured via the
# following annotations:
#
# * `prometheus.io/scrape`: Only scrape pods that have a value of `true`
# * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
# * `prometheus.io/port`: Scrape the pod on the indicated port instead of the
# pod's declared ports (default is a port-free target if none are declared).
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
As explained in the comments above, this is configured so pods that contains the prometheus.io/scrape set to true, will be scraped by Prometheus, becoming a target. Pods will then need to have a metrics endpoint exposing metrics in the Prometheus format. You can use the prometheus.io/path and prometheus.io/port to configure where Prometheus will look for the metrics on your pod.

Prometheus + Heapster

I saw there is no sink configuration for Prometheus in this heapster document. Is there any simple way to combine these two and monitor.
Prometheus uses a pull model to retrieve the data, while Heapster is tool, which pushes their metrics to a certain endpoint (pull model).
I assume you want to get Kubernetes metrics into Prometheus. You don't need heapster for that, since the cadvicor has an Prometheus endpoint which can be scraped directly. Also the kubelet itself provides some metrics.
The Prometheus config would look like this:
- job_name: 'kubernetes-nodes'
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- job_name: 'kubernetes-cadvisor'
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__meta_kubernetes_node_address_InternalIP]
target_label: __address__
regex: (.*)
replacement: $1:4194
Assuming you are using the default cadvisort port 4194. Also Prometheus should be able to detect the correct kubelet port.
Additional Note: The job for scraping cAdvisor is only required when using a Kubernetes version >= 1.7. Before that the cAdvisor metrics accidentally got exposed via the Kubelet.