Prometheus configuration for monitoring Orleans in Kubernetes - kubernetes

I'm trying to get Prometheus functioning with my Orleans silos...
I use this consumer to expose Orleans metrics for Prometheus on port 8082. With a local Prometheus instance and using the grafana.json from the same repository I see that it works.
_ = builder.AddPrometheusTelemetryConsumerWithSelfServer(port: 8082);
Following this guide to install Prometheus on Kubernetes on a different namespace that my silos are deployed.
Following instructions I added the prometheus labels to my orleans deployment yaml:
spec:
replicas: 2
selector:
matchLabels:
app: mysilo
template:
metadata:
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '8082'
labels:
app: mysilo
My job in prometheus yml:
- job_name: "orleans"
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- orleans
selectors:
- role: "pod"
label: "app=mysilo"
According to the same guide, all the pods metrics get discovered if "the pod metadata is annotated with prometheus.io/scrape and prometheus.io/port annotations.". I assume I don't need any extra installations.
With all this, and port forwarding my prometheus pod, I can see prometheus is working in http://localhost:9090/metrics but no metrics are being shown in my grafana dashboard (again, I could make it work in local machine with only one silo).
When exploring grafana I find that it seems it can't find the instances:
sum(rate(process_cpu_seconds_total{job=~"orleans", instance=~"()"}[3m])) * 100
The aim is to monitor resources my orleans silos are using (not the pods metrics themselves, but orleans metrics), but I'm missing something :(

Thanks to #BozoJoe's comment I could debug this.
The problem was that it was trying to scrape ports 30000 and 1111 instead of 8082 like I said before. I could see this thanks to the Prometheus dashboard at localhost:9090/targets
So I went to prometheus config file and make sure to start scrapping the correct port (also I added some restrictions to the search for name):
- job_name: "orleans"
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- orleans
selectors:
- role: "pod"
label: "app=mysilo"
relabel_configs:
- source_labels: [__meta_kubernetes_pod_container_name]
action: keep
regex: 'my-silo-name*'
- source_labels: [__address__]
action: replace
regex: ([^:]+):.*
replacement: $1:8081
target_label: __address__

Related

Limit prometheus to discover pods in specific namespaces ONLY

I am trying to run Prometheus to ONLY monitor pods in specific namespaces (in openshift cluster).
I am getting "cannot list pods at the cluster scope" - But I have tried to set it to not use ClusterScope (only look in specific namespaces instead)..
I've set:
prometheus.yml: |
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
- localhost:9090
- job_name: kubernetes-pods
kubernetes_sd_configs:
- namespaces:
names:
- api-mytestns1
- api-mytestns2
role: pod
relabel_configs:
[cut]
I get this error - even if I remove the -job_name: kubernetes-pods entirely.. so maybe its something else in prometheus, that needs disabling?
I found that one had to overwrite server.alertmanagers with a complete copy of the settings in charts/prometheus/templates/server-configmap.yaml - to override the hardcoded default in those, to try and scrape cluster-wide.

How to configure a prometheus target for kubelet metrics

I would like to plot in Grafana, the metrics for the readiness/liveness probes for some of my pods. Currently, the way I am deploying prometheus in my cluster is using:
helm install prometheus stable/prometheus -n prometheus
I am able to see all standard metrics by going to the prometheus UI, but I am trying to figure out how to get the probes metrics. Apparently the kubelet expose these metrics in /metrics/probes, but I don't know how to configure them. Moreover, I noted that apparently the "standard" metrics are grabbed from the kubernetes api-server on the /metrics/ path, but so far I haven't configured any path nor any config file (I just run the above command to install prometheus). I am assuming that this /metrics/ path is hardcoded somewhere in the helm chart repo, but since I want to get the metrics for the kubelets, this might be trickier, as my understanding is that he api-server lives in the master k8s node, and the kubelet only runs on the worker nodes (so I have no idea where to point the /metrics/probes path).
Use Prometheus Operator and create ServiceMonitor in which you can specify the endpoints, ports exposed by kubelet or any other component. Prometheus will start scraping the endpoints for metrics.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kubelet
labels:
k8s-app: kubelet
spec:
jobLabel: k8s-app
endpoints:
- port: https-metrics
scheme: https
interval: 30s
tlsConfig:
insecureSkipVerify: true
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
- port: https-metrics
scheme: https
path: /metrics/cadvisor
interval: 30s
honorLabels: true
tlsConfig:
insecureSkipVerify: true
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
selector:
matchLabels:
k8s-app: kubelet
namespaceSelector:
matchNames:
- kube-system

Monitoring Kubernetes cluster using prometheus outside the k8 cluster

We have kubernetes cluster where I have service account "kube", namespace "monitoring" with cluster role binding created to monitor cluster
We have prometheus installed on a linux system (on prem) outside the cluster and is installed using "root"
When I try to connect to the k8 cluster with the https api using ca.crt and user token (given by kubernetes admin), it throws multiple errors.
Error messages:
component="discovery manager scrape" msg="Cannot create service discovery" err="unable to use specified CA cert /root/prometheus/ca.crt" type=*kubernetes.SDConfig
component="discovery manager scrape" msg="Cannot create service discovery" err="unable to use specified CA cert /root/prometheus/ca.crt" type=*kubernetes.SDConfig
Prometheus configuration:
- job_name: 'kubernetes-apiservers'
scheme: https
tls_config:
ca_file: /root/prometheus/ca.crt
bearer_token_file: /root/prometheus/user_token
kubernetes_sd_configs:
- role: endpoints
api_server: https://example.com:1234
bearer_token_file: /root/prometheus/user_token
tls_config:
ca_file: /root/prometheus/prometheus-2.12.0.linux-amd64/ca.crt
relabel_configs:
- source_labels: [monitoring, monitoring-sa, 6443]
action: keep
regex: default;kubernetes;https
- job_name: 'kubernetes-nodes'
scheme: https
tls_config:
ca_file: /root/prometheus/ca.crt
bearer_token_file: /root/prometheus/user_token
kubernetes_sd_configs:
- role: node
api_server: https://example.com:1234
bearer_token_file: /root/prometheus/user_token
tls_config:
ca_file: /root/prometheus/ca.crt
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: https://example.com:1234
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
The main problem you're facing is: "unable to use specified CA cert /root/prometheus/ca.crt"
Someone recently faced the same problem:
https://github.com/prometheus/prometheus/issues/6015#issuecomment-532058465
He solved it by reinstalling the new version.
Version 2.13.1 is out. Try installing the latest version, it might solve your problem too.
Your ca.crt is most probably still in base64 format since secrets are encoded that way when describing them, as explained here.
Maybe your ca.crt have some error, check your ca cert file, make sure this file format like this:
-----BEGIN CERTIFICATE-----
xxxxx
-----END CERTIFICATE-----
I think your ca.crt is get by kubectl get serviceaccount -o yaml, but this is a public key with your kubernetes cluster, so, if you want to get the token, you can specify the serviceAccountName in the yaml file with a new Deployment, like this:
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
name: test
spec:
replicas: 1
template:
metadata:
labels:
app: test
version: v1
spec:
serviceAccountName: prometheus
containers:
- name: test
image: alpine
imagePullPolicy: Always
command: ["ping", "127.0.0.1"]
imagePullSecrets:
- name: harbor-secret
restartPolicy: Always
Then, get your token and ca.crt under /var/run/secrets/kubernetes.io/serviceaccount/.

Unable to scrape metrics from pods

I am able to scrape Prometheus metrics from a Kubernetes service using this Prometheus job configuration:
- job_name: 'prometheus-potapi'
static_configs:
- targets: ['potapi-service.potapi:1234']
It uses Kubernetes DNS and it gives me the metrics from any of my three pods I use for my service.
I would like to see the result for each pod.
I am able to see the data I want using this configuration:
- job_name: 'prometheus-potapi-pod'
static_configs:
- targets: ['10.1.0.126:1234']
I have searched and experimented using the service discovery mechanism available in Prometheus. Unfortunately, I don't understand how it should be setup. The service discovery reference isn't really helpful if you don't know how it works.
I am looking for an example where the job using the IP number is replaced with some service discovery mechanism. Specifying the IP was enough for me to see that the data I'm looking for is exposed.
The pods I want to scrape metrics from all live in the same namespace, potapi.
The metrics are always exposed through the same port, 1234.
Finally, the are all named like this:
potapi-deployment-754d96f855-lkh4x
potapi-deployment-754d96f855-pslgg
potapi-deployment-754d96f855-z2zj2
When I do
kubectl describe pod potapi-deployment-754d96f855-pslgg -n potapi
I get this description:
Name: potapi-deployment-754d96f855-pslgg
Namespace: potapi
Node: docker-for-desktop/192.168.65.3
Start Time: Tue, 07 Aug 2018 14:18:55 +0200
Labels: app=potapi
pod-template-hash=3108529411
Annotations: <none>
Status: Running
IP: 10.1.0.127
Controlled By: ReplicaSet/potapi-deployment-754d96f855
Containers:
potapi:
Container ID: docker://72a0bafbda9b82ddfc580d79488a8e3c480d76a6d17c43d7f7d7ab18458c56ee
Image: potapi-service
Image ID: docker://sha256:d64e94c2dda43c40f641008c122e6664845d73cab109768efa0c3619cb0836bb
Ports: 4567/TCP, 4568/TCP, 1234/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
State: Running
Started: Tue, 07 Aug 2018 14:18:57 +0200
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-4fttn (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
default-token-4fttn:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-4fttn
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
How would you rewrite the job definition given these prerequisites?
Here they use example.io/scrape=true (and similar annotations for specifying the scrape port and the scrape path if it's not /metrics), which is how one achieves the "autodiscovery" part.
If you apply that annotation -- and the relevant config snippets in the Prom config -- to a Service, then Prom will scrape the port and path on the Service, meaning you will have stats for the Service itself, and not the individual Endpoints behind it. Similarly, if you label the Pods, you will gather metrics for the Pods but they would need to be rolled up to have a cross-Pod view of the state of affairs. There are multiple different resource types that can be autodiscovered, including node and ingress, also. They all behave similarly.
Unless you have grave CPU or storage concerns for your Prom instance, I absolutely wouldn't enumerate the scrape targets in the config like that: I would use the scrape annotations, meaning you can change who is scraped, what port, etc. without having to reconfigure Prom each time.
Be aware that if you want to use their example as-is, and you want to apply those annotations from within the kubernetes resource YAML, ensure that you quote the : 'true' value, otherwise YAML will promote that to be a boolean literal, and kubernetes annotations can only be string values.
Applying the annotations from the command line will work just fine:
kubectl annotate pod -l app=potapi example.io/scrape=true
(BTW, they use example.io/ in their example, but there is nothing special about that string except it namespaces the scrape part to keep it from colliding with something else named scrape. So feel free to use your organization's namespace if you wish to avoid having something weird named example.io/ in your cluster)
I ended up with this solution:
...
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__address__]
action: replace
regex: ([^:]+)(?::\d+)?
replacement: $1:1234
target_label: __address__
...
There are two parts.
Check for an annotation prometheus.io/scrape with the value 'true'. It is done in the first source_labels.
It may not be self evident that prometheus_io_scrape translates to prometheus.io/scrape
Get the adress and add the desired port to it. It is done on the second source_labels. The __address__ source will be queried for a host name or ip number. In this case, a ip number is found using the cryptic regex ([^:]+)(?::\d+)?. The port I want to use is ´1234´ so I hardcoded it in replacement: The result is that the
__address__ now will contain the ip of the pod with the port 1234 attached on the format 10.1.0.172:1234 where 10.1.0.172 is the ip number found.
With this configuration in Prometheus I should be able to find pods with the proper annotation.
Where should the annotation be added then? I ended up adding it in my Kubernetes deployment template description.
The complete deployment description looks like this:
apiVersion: apps/v1
kind: Deployment
metadata:
name: potapi-deployment
namespace: potapi
labels:
app: potapi
spec:
replicas: 3
selector:
matchLabels:
app: potapi
template:
metadata:
annotations:
prometheus.io/scrape: 'true'
labels:
app: potapi
spec:
containers:
- name: potapi
image: potapi-service
imagePullPolicy: IfNotPresent
ports:
- containerPort: 4567
name: service
- containerPort: 1234
name: metrics
The interesting annotation is added in the template section

Prometheus + Heapster

I saw there is no sink configuration for Prometheus in this heapster document. Is there any simple way to combine these two and monitor.
Prometheus uses a pull model to retrieve the data, while Heapster is tool, which pushes their metrics to a certain endpoint (pull model).
I assume you want to get Kubernetes metrics into Prometheus. You don't need heapster for that, since the cadvicor has an Prometheus endpoint which can be scraped directly. Also the kubelet itself provides some metrics.
The Prometheus config would look like this:
- job_name: 'kubernetes-nodes'
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- job_name: 'kubernetes-cadvisor'
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__meta_kubernetes_node_address_InternalIP]
target_label: __address__
regex: (.*)
replacement: $1:4194
Assuming you are using the default cadvisort port 4194. Also Prometheus should be able to detect the correct kubelet port.
Additional Note: The job for scraping cAdvisor is only required when using a Kubernetes version >= 1.7. Before that the cAdvisor metrics accidentally got exposed via the Kubelet.