Prometheus cannot scrape kubernetes metrics - kubernetes

I have setup a kubernetes cluster using kubeadm. I then deployed prometheus on it using the community helm charts.
I notice that prometheus cannot scrape metrics from the scheduler, etcd or the controller manager.
For example I see errors like this:
Get "https://192.168.3.83:10259/metrics": dial tcp 192.168.3.83:10259: connect: connection refused
The reason I get these errors is because there is in fact nothing listening on https://192.168.3.83:10259/metrics. This because kube-scheduler has --bind-address set to 127.0.0.1
One way I can fix this is by manually editing the manifest files in /etc/kubernetes/manifests, changin --bind-address to 0.0.0.0
When I do this prometheus is able to scrape those metrics.
However, is this the correct solution? I assume that those manifest files are actually managed by kubernetes itself, and that I should probably not directly edit those, and do something else. But what?
edit: I have since noticed that changes I make to the manifest files do indeed get overwritten when doing an upgrade. And now I have again lost the etcd and other metrics.
I must be missing something obvious here.
I though that maybe changing the "clusterconfiguration" configmap would do the trick. But if you can do this (and how you should do this) is not documented anywhere.
I have an out of the box kubernetes, and out of the box prometheus and it does not collect metrics. I cannot be the only one running in to this issue. Is there really no solution?

Exposing kube-scheduler, etcd or the kube-controller manager (and persisting the changes)
You can expose the metrics on 0.0.0.0 just as you have done by editing the configmap and then pulling those changes to each control plane node. These changes will then be persisted accross upgrades. For etcd this can also be done in another way which might preferrable (see further down)
First step: edit the configmap with the below command:
kubectl edit -n kube-system cm/kubeadm-config
Add/change the relevant bind addresses as described here, but for example for etcd like outline below:
kind: ClusterConfiguration
etcd:
local:
extraArgs:
listen-metrics-urls: http://0.0.0.0:2381
Second step: NOTE: Please read here to understand the upgrade command before applying it to any cluster you care about since it might also update cluster component versions unless you just did an upgrade :)
For the changes to be reflected you thus need to run kubeadm upgrade node on each controlplane node (one at the time please..) This will bring down the affected pods (those to which you have made changes) and start a new instance with the metrics exposed. You can verify before & after with for example: netstat -tulpn | grep etcd
For etcd the default port in Prometheus is 2379 so it also need to be adjusted to 2381 as below in your prometheus value file:
kubeEtcd:
service:
port: 2381
targetPort: 2381
Source to the above solution here
Accessing existing etcd metrics without exposing it further
For ETCD metrics there is a second, perhaps preferred way of accessing the metrics using the already exposed https metric endpoint on port 2379 (which requires authentication). You can verify this with Curl:
curl https://<your IP>:2379/metrics -k --cert /etc/kubernetes/pki/etcd/healthcheck-client.crt --key /etc/kubernetes/pki/etcd/healthcheck-client.key
For this to work we need to supply Prometheus with the correct certificates as a secret in kubernetes. Steps described here and outlined below:
Create a secret in the namespace where Prometheus is deployed.
kubectl -n monitoring create secret generic etcd-client-cert --from-file=/etc/kubernetes/pki/etcd/ca.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key
add the following to your prometheus helm value file
prometheus:
prometheusSpec:
secrets: ['etcd-client-cert']
kubeEtcd:
serviceMonitor:
scheme: https
insecureSkipVerify: false
serverName: localhost
caFile: /etc/prometheus/secrets/etcd-client-cert/ca.crt
certFile: /etc/prometheus/secrets/etcd-client-cert/healthcheck-client.crt
keyFile: /etc/prometheus/secrets/etcd-client-cert/healthcheck-client.key
Prometheus should now be able to access the https endpoint with the certificates that we mounted in the secret. I would say this is the preferred way for etcd since we don't expose the open http endpoint further.

Related

how to connect zookeeper after deploying helm chart in gke?

we are creating a vmware carvel package, I need do a sanity check for zookeeper, how can i check the output in gke?
Zookeeper output
Curl and localhost is failing to connect.
The problem is that your service type is ClusterIP which is not available from outside the Kubernetes cluster. There are two ways you could do this
If you need to really access this regularly from outside the cluster, you should deploy a service of type NodePort or LoadBalancer or an ingress. These would be reachable from outside
If you only want to check something quickly, you can temporarily make zookeeper visible with
kubectl port-forward zookeeper-0 2181

How to set DNS entrys & network configuration for kubernetes cluster #home (noob here)

I am currently running a Kubernetes cluster on my own homeserver (in proxmox ct's, was kinda difficult to get working because I am using zfs too, but it runs now), and the setup is as follows:
lb01: haproxy & keepalived
lb02: haproxy & keepalived
etcd01: etcd node 1
etcd02: etcd node 2
etcd03: etcd node 3
master-01: k3s in server mode with a taint for not accepting any jobs
master-02: same as above, just joining with the token from master-01
master-03: same as master-02
worker-01 - worker-03: k3s agents
If I understand it correctly k3s delivers with flannel as a CNI pre-installed, as well as traefik as a Ingress Controller.
I've setup rancher on my cluster as well as longhorn, the volumes are just zfs volumes mounted inside the agents tho, and as they aren't on different hdd's I've set the replicas to 1. I have a friend running the same setup (we set them up together, just yesterday) and we are planing on joining our networks trough vpn tunnels and then providing storage nodes for each other as an offsite backup.
So far I've hopefully got everything correct.
Now to my question: I've both got a static ip #home as well as a domain, and I've set that domain to my static ip
Something like that: (don't know how dns entries are actually written, just from the top of my head for your reference, the entries are working well.)
A example.com. [[my-ip]]
CNAME *.example.com. example.com
I've currently made a port-forward to one of my master nodes for port 80 & 443 but I am not quite sure how you would actually configure that with ha in mind, and my rancher is throwing a 503 after visiting global settings, but I have not changed anything.
So now my question: How would one actually configure the port-forward and, as far as I know k3s has a load-balancer pre-installed, but how would one configure those port-forwards for ha? the one master node it's pointing to could, theoretically, just stop working and then all services would not be reachable anymore from outside.
Assuming your apps are running on port 80 and port 443 your ingress should give you a service with an external ip and you would point your dns at that. Read below for more info.
Seems like you are not a noob! you got a lot going on with your cluster setup. What you are asking is a bit complicated to answer and I will have to make some assumptions about your setup, but will do my best to give you at least some intial info.
This tutorial has a ton of great info and may help you with what you are doing. They use kubeadm instead of k3s, buy you can skip that section if you want and still use k3s.
https://www.debontonline.com/p/kubernetes.html
If you are setting up and installing etcd on your own, you don't need to do that k3s will create an etcd cluster for you that run inside pods on your cluster.
Load Balancing your master nodes
haproxy + keepalived nodes would be configured to point to the ips of your master nodes at port 6443 (TCP), the keepalived will give you a virtual ip and you would configure your kubeconfig (that you get from k3s) to talk to that ip. On your router you will want to reserve an ip (make sure not to assign that to any computers).
This is a good video that explains how to do it with a nodejs server but concepts are the same for your master nodes:
https://www.youtube.com/watch?v=NizRDkTvxZo
Load Balancing your applications running in the cluster
Use an K8s Service read more about it here: https://kubernetes.io/docs/concepts/services-networking/service/
essentially you need an external ip, I prefer to do this with metal lb.
metal lb gives you a service of type load balancer with an external ip
add this flag to k3s when creating initial master node:
https://metallb.universe.tf/configuration/k3s/
configure metallb
https://metallb.universe.tf/configuration/#layer-2-configuration
You will want to reserve more ips on your router and put them under the addresses section in the yaml below. In this example you will see you have 11 ips in the range 192.168.1.240 to 192.168.1.250
create this as a file example metallb-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
namespace: metallb-system
name: config
data:
config: |
address-pools:
- name: default
protocol: layer2
addresses:
- 192.168.1.240-192.168.1.250
kubectl apply -f metallb-cm.yaml
Install with these yaml files:
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.12.1/manifests/namespace.yaml
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.12.1/manifests/metallb.yaml
source - https://metallb.universe.tf/installation/#installation-by-manifest
ingress
Will need a service of type load balancer, use its external ip as the external ip
kubectl get service -A - look for your ingress service and see if it has an external ip and does not say pending
I will do my best to answer any of your follow up questions. Good Luck!

Where do I edit the helm chart if I want to change the port used by alertmanager-operated service in kubernetes?

I am installing the below helm package on my K8s cluster
https://github.com/prometheus-community/helm-charts/releases/tag/kube-prometheus-stack-21.0.0
I've got it locally and when I deploy it creates everything including a service called alertmanager-operated. Its listening on TCP port 9093 and I need to change this. I dont see where this can be configured in the values.yaml or anywhere else in the package
It's here. Your values.yaml can have:
...
alertmanager:
service:
port: <your port #>
Follow-up on your comment ... cant tell how the alertmanager-operated service gets created and how to configure it
Here's a good source for quick understanding of various k8s services. For greater configure details checkout the official documentation. Set the values according to your need and k8s will create the service for you when you apply the chart.

Kubernetes local cluster Pod hostPort - application not accessible

I am trying to access a web api deployed into my local Kubernetes cluster running on my laptop (Docker -> Settings -> Enable Kubernetes). The below is my Pod Spec YAML.
kind: Pod
apiVersion: v1
metadata:
name: test-api
labels:
app: test-api
spec:
containers:
- name: testapicontainer
image: myprivaterepo/testapi:latest
ports:
- name: web
hostPort: 55555
containerPort: 80
protocol: TCP
kubectl get pods shows the test-api running. However, when I try to connect to it using http://localhost:55555/testapi/index from my laptop, I do not get a response. But, I can access the application from a container in a different pod within the cluster (I did a kubectl exec -it to a different container), using the URL
http://test-api pod cluster IP/testapi/index
. Why cannot I access the application using the localhost:hostport URL?
I'd say that this is strongly not recommended.
According to k8s docs: https://kubernetes.io/docs/concepts/configuration/overview/#services
Don't specify a hostPort for a Pod unless it is absolutely necessary. When you bind a Pod to a hostPort, it limits the number of places the Pod can be scheduled, because each <hostIP, hostPort, protocol> combination must be unique. If you don't specify the hostIP and protocol explicitly, Kubernetes will use 0.0.0.0 as the default hostIP and TCP as the default protocol.
If you only need access to the port for debugging purposes, you can use the apiserver proxy or kubectl port-forward.
If you explicitly need to expose a Pod's port on the node, consider using a NodePort Service before resorting to hostPort.
So... Is the hostPort really necessary on your case? Or a NodePort Service would solve it?
If it is really necessary , then you could try using the IP that is returning from the command:
kubectl get nodes -o wide
http://ip-from-the-command:55555/testapi/index
Also, another test that may help your troubleshoot is checking if your app is accessible on the Pod IP.
UPDATE
I've done some tests locally and understood better what the documentation is trying to explain. Let me go through my test:
First I've created a Pod with hostPort: 55555, I've done that with a simple nginx.
Then I've listed my Pods and saw that this one was running on one of my specific Nodes.
Afterwards I've tried to access the Pod in the port 55555 through my master node IP and other node IP without success, but when trying to access through the Node IP where this Pod was actually running, it worked.
So, the "issue" (and actually that's why this approach is not recommended), is that the Pod is accessible only through that specific Node IP. If it restarts and start in a different Node, the IP will also change.

Why can't access my gRPC REST service that is running in Minikube?

I've been learning Kubernetes recently and just came across this small issue. For some sanity checks, here is the functionality of my grpc app running locally:
> docker run -p 8080:8080 -it olamai/simulation:0.0.1
< omitted logs >
> curl localhost:8080/v1/todo/all
{"api":"v1","toDos":[{}]}
So it works! All I want to do now is deploy it in Minikube and expose the port so I can make calls to it. My end goal is to deploy it to a GKE or Azure cluster and make calls to it from there (again, just to learn and get the hang of everything.)
Here is the yaml I'm using to deploy to minikube
And this is what I run to deploy it on minikube
> kubectl create -f deployment.yaml
I then run this to get the url
> minikube service sim-service --url
http://192.168.99.100:30588
But this is what happens when I make a call to it
> curl http://192.168.99.100:30588/v1/todo/all
curl: (7) Failed to connect to 192.168.99.100 port 30588: Connection refused
What am I doing wrong here?
EDIT: I figured it out, and you should be able to see the update in the linked file. I had pull policy set to Never so it was out of date 🤦
I have a new question now... I'm now able to just create the deployment in minikube (no NodePort) and still make calls to the api... shouldn't the deployment need a NodePort service to expose ports?
I checked your yaml file and it works just fine. But only I realized that you put 2 types for your services, LoadBalancer and also NodePort which is not needed.
As if you check from this documentation definition of LoadBalancer, you will see
LoadBalancer: Exposes the service externally using a cloud provider’s
load balancer. NodePort and ClusterIP services, to which the external
load balancer will route, are automatically created.
As an answer for your next question, you probably put type: LoadBalancer to your deployment yaml file, that's why you are able to see NodePort anyway.
If you put type: ClusterIP to your yaml, then service will be exposed only within cluster, and you won't able to reach to your service outside of cluster.
From same documentation:
ClusterIP: Exposes the service on a cluster-internal IP. Choosing this
value makes the service only reachable from within the cluster. This
is the default ServiceType