How to create ClusterPodMonitoring in GCP? - kubernetes

I'm trying to follow their docs and create this pod monitoring
i apply it and i see nothing in metrics
what am i doing wrong?
apiVersion: monitoring.googleapis.com/v1
kind: ClusterPodMonitoring
metadata:
name: monitoring
spec:
selector:
matchLabels:
app: blah
namespaceSelector:
any: true
endpoints:
- port: metrics
interval: 30s

As mentioned in the offical documnentation:
The following manifest defines a PodMonitoring resource, prom-example, in the NAMESPACE_NAME namespace. The resource uses a Kubernetes label selector to find all pods in the namespace that have the label app with the value prom-example. The matching pods are scraped on a port named metrics, every 30 seconds, on the /metrics HTTP path.
apiVersion: monitoring.googleapis.com/v1
kind: PodMonitoring
metadata:
name: prom-example
spec:
selector:
matchLabels:
app: prom-example
endpoints:
- port: metrics
interval: 30s
To apply this resource, run the following command:
kubectl -n NAMESPACE_NAME apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/prometheus-engine/v0.5.0/examples/pod-monitoring.yaml
Also check the document on Obeserving your GKE clusters.
UPDATE:
After applying the manifests, the managed collection will be running but no metrics will be generated. You must deploy a PodMonitoring resource that scrapes a valid metrics endpoint to see any data in the Query UI.
Check the logs by running the below commands:
kubectl logs -f -ngmp-system -lapp.kubernetes.io/part-of=gmp
kubectl logs -f -ngmp-system -lapp.kubernetes.io/name=collector -c prometheus
If you see any error follow this link to troubleshoot.

Related

how to restrict a pod to connect only to 2 pods using networkpolicy and test connection in k8s in simple way?

Do I still need to expose pod via clusterip service?
There are 3 pods - main, front, api. I need to allow ingress+egress connection to main pod only from the pods- api and frontend. I also created service-main - service that exposes main pod on port:80.
I don't know how to test it, tried:
k exec main -it -- sh
netcan -z -v -w 5 service-main 80
and
k exec main -it -- sh
curl front:80
The main.yaml pod:
apiVersion: v1
kind: Pod
metadata:
labels:
app: main
item: c18
name: main
spec:
containers:
- image: busybox
name: main
command:
- /bin/sh
- -c
- sleep 1d
The front.yaml:
apiVersion: v1
kind: Pod
metadata:
labels:
app: front
name: front
spec:
containers:
- image: busybox
name: front
command:
- /bin/sh
- -c
- sleep 1d
The api.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
app: api
name: api
spec:
containers:
- image: busybox
name: api
command:
- /bin/sh
- -c
- sleep 1d
The main-to-front-networkpolicy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: front-end-policy
spec:
podSelector:
matchLabels:
app: main
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: front
ports:
- port: 8080
egress:
- to:
- podSelector:
matchLabels:
app: front
ports:
- port: 8080
What am I doing wrong? Do I still need to expose main pod via service? But should not network policy take care of this already?
Also, do I need to write containerPort:80 in main pod? How to test connectivity and ensure ingress-egress works only for main pod to api, front pods?
I tried the lab from ckad prep course, it had 2 pods: secure-pod and web-pod. There was issue with connectivity, the solution was to create network policy and test using netcat from inside the web-pod's container:
k exec web-pod -it -- sh
nc -z -v -w 1 secure-service 80
connection open
UPDATE: ideally I want answers to these:
a clear explanation of the diff btw service and networkpolicy.
If both service and netpol exist - what is the order of evaluation that the traffic/request goes thru? It first goes thru netpol then service? Or vice versa?
if I want front and api pods to send/receive traffic to main - do I need separate services exposing front and api pods?
Network policies and services are two different and independent Kubernetes resources.
Service is:
An abstract way to expose an application running on a set of Pods as a network service.
Good explanation from the Kubernetes docs:
Kubernetes Pods are created and destroyed to match the state of your cluster. Pods are nonpermanent resources. If you use a Deployment to run your app, it can create and destroy Pods dynamically.
Each Pod gets its own IP address, however in a Deployment, the set of Pods running in one moment in time could be different from the set of Pods running that application a moment later.
This leads to a problem: if some set of Pods (call them "backends") provides functionality to other Pods (call them "frontends") inside your cluster, how do the frontends find out and keep track of which IP address to connect to, so that the frontend can use the backend part of the workload?
Enter Services.
Also another good explanation in this answer.
For production you should use a workload resources instead of creating pods directly:
Pods are generally not created directly and are created using workload resources. See Working with Pods for more information on how Pods are used with workload resources.
Here are some examples of workload resources that manage one or more Pods:
Deployment
StatefulSet
DaemonSet
And use services to make requests to your application.
Network policies are used to control traffic flow:
If you want to control traffic flow at the IP address or port level (OSI layer 3 or 4), then you might consider using Kubernetes NetworkPolicies for particular applications in your cluster.
Network policies target pods, not services (an abstraction). Check this answer and this one.
Regarding your examples - your network policy is correct (as I tested it below). The problem is that your cluster may not be compatible:
For Network Policies to take effect, your cluster needs to run a network plugin which also enforces them. Project Calico or Cilium are plugins that do so. This is not the default when creating a cluster!
Test on kubeadm cluster with Calico plugin -> I created similar pods as you did, but I changed container part:
spec:
containers:
- name: main
image: nginx
command: ["/bin/sh","-c"]
args: ["sed -i 's/listen .*/listen 8080;/g' /etc/nginx/conf.d/default.conf && exec nginx -g 'daemon off;'"]
ports:
- containerPort: 8080
So NGINX app is available at the 8080 port.
Let's check pods IP:
user#shell:~$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
api 1/1 Running 0 48m 192.168.156.61 example-ubuntu-kubeadm-template-2 <none> <none>
front 1/1 Running 0 48m 192.168.156.56 example-ubuntu-kubeadm-template-2 <none> <none>
main 1/1 Running 0 48m 192.168.156.52 example-ubuntu-kubeadm-template-2 <none> <none>
Let's exec into running main pod and try to make request to the front pod:
root#main:/# curl 192.168.156.61:8080
<!DOCTYPE html>
...
<title>Welcome to nginx!</title>
It is working.
After applying your network policy:
user#shell:~$ kubectl apply -f main-to-front.yaml
networkpolicy.networking.k8s.io/front-end-policy created
user#shell:~$ kubectl exec -it main -- bash
root#main:/# curl 192.168.156.61:8080
...
Not working anymore, so it means that network policy is applied successfully.
Nice option to get more information about applied network policy is to run kubectl describe command:
user#shell:~$ kubectl describe networkpolicy front-end-policy
Name: front-end-policy
Namespace: default
Created on: 2022-01-26 15:17:58 +0000 UTC
Labels: <none>
Annotations: <none>
Spec:
PodSelector: app=main
Allowing ingress traffic:
To Port: 8080/TCP
From:
PodSelector: app=front
Allowing egress traffic:
To Port: 8080/TCP
To:
PodSelector: app=front
Policy Types: Ingress, Egress

kubernetes "unable to get metrics"

I am trying to autoscale a deployment and a statefulset, by running respectivly these two commands:
kubectl autoscale statefulset mysql --cpu-percent=50 --min=1 --max=10
kubectl expose deployment frontend --type=LoadBalancer --name=frontend
Sadly, on the minikube dashboard, this error appears under both services:
failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
Searching online I read that it might be a dns error, so I checked but CoreDNS seems to be running fine.
Both workloads are nothing special, this is the 'frontend' deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend
labels:
app: frontend
spec:
replicas: 3
selector:
matchLabels:
app: frontend
template:
metadata:
labels:
app: frontend
spec:
containers:
- name: frontend
image: hubuser/repo
ports:
- containerPort: 3000
Has anyone got any suggestions?
First of all, could you please verify if the API is working fine? To do so, please run kubectl get --raw /apis/metrics.k8s.io/v1beta1.
If you get an error similar to:
“Error from server (NotFound):”
Please follow these steps:
1.- Remove all the proxy environment variables from the kube-apiserver manifest.
2.- In the kube-controller-manager-amd64, set --horizontal-pod-autoscaler-use-rest-clients=false
3.- The last scenario is that your metric-server add-on is disabled by default. You can verify it by using:
$ minikube addons list
If it is disabled, you will see something like metrics-server: disabled.
You can enable it by using:
$minikube addons enable metrics-server
When it is done, delete and recreate your HPA.
You can use the following thread as a reference.

"Failed to update endpoint default/myservice: Operation cannot be fulfilled on endpoints "myservice":

I keep getting the below error inconsistently on one of my services' endpoint object. : "Failed to update endpoint default/myservice: Operation cannot be fulfilled on endpoints "myservice": the object has been modified; please apply your changes to the latest version and try again". I am sure I am not editing the endpoint object manually because all my Kubernetes objects are deployed through helm3 charts. But it keeps giving the same error. It goes away if I delete and recreate the service. Please help/give any leads as to what could be the issue.
Below is my service.yml object from the cluster:
kind: Service
apiVersion: v1
metadata:
name: myservice
namespace: default
selfLink: /api/v1/namespaces/default/services/myservice
uid: 4af68af5-4082-4ffb-b11b-641d16b28f31
resourceVersion: '1315842'
creationTimestamp: '2020-08-13T11:00:53Z'
labels:
app: myservice
app.kubernetes.io/managed-by: Helm
chart: myservice-1.0.0
heritage: Helm
release: vanilla
annotations:
meta.helm.sh/release-name: vanilla
meta.helm.sh/release-namespace: default
spec:
ports:
- name: http
protocol: TCP
port: 5000
targetPort: 5000
selector:
app: myservice
clusterIP: 10.0.225.85
type: ClusterIP
sessionAffinity: None
status:
loadBalancer: {}
Inside the Kubernetes system is a control loop which evaluates the selector of every Service and saves the results into a corresponding Endpoints object. So a good place to debug if your service side is fine is to look at the Pods being selected by the Service. Selector labels should be the labels defined on pods.
kubectl get pods -l app=myservice
If you get results, look at the RESTARTS column if pods are restarting, if pods are restarting there could be intermittent connectivity issues.
If you are not getting results it could be due to wrong selector labels. Verify labels on pods by running the command
kubectl get pods -A --show-labels
A good point of reference is https://kubernetes.io/docs/tasks/debug-application-cluster/debug-service/
It's common behavior and might happen when you try to deploy resources by copy-pasting manifests including metadata fields like creationTimeStamp, resourceVersion, selfLink etc.
Those fields are generated before the object is persisted. It appears when you attempt to update the resource that has been already updated and the version has changed so it refuses to update it. The solution is to check your yamls and apply must-have objects without specifying fields populated by the system.

How to configure a prometheus target for kubelet metrics

I would like to plot in Grafana, the metrics for the readiness/liveness probes for some of my pods. Currently, the way I am deploying prometheus in my cluster is using:
helm install prometheus stable/prometheus -n prometheus
I am able to see all standard metrics by going to the prometheus UI, but I am trying to figure out how to get the probes metrics. Apparently the kubelet expose these metrics in /metrics/probes, but I don't know how to configure them. Moreover, I noted that apparently the "standard" metrics are grabbed from the kubernetes api-server on the /metrics/ path, but so far I haven't configured any path nor any config file (I just run the above command to install prometheus). I am assuming that this /metrics/ path is hardcoded somewhere in the helm chart repo, but since I want to get the metrics for the kubelets, this might be trickier, as my understanding is that he api-server lives in the master k8s node, and the kubelet only runs on the worker nodes (so I have no idea where to point the /metrics/probes path).
Use Prometheus Operator and create ServiceMonitor in which you can specify the endpoints, ports exposed by kubelet or any other component. Prometheus will start scraping the endpoints for metrics.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kubelet
labels:
k8s-app: kubelet
spec:
jobLabel: k8s-app
endpoints:
- port: https-metrics
scheme: https
interval: 30s
tlsConfig:
insecureSkipVerify: true
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
- port: https-metrics
scheme: https
path: /metrics/cadvisor
interval: 30s
honorLabels: true
tlsConfig:
insecureSkipVerify: true
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
selector:
matchLabels:
k8s-app: kubelet
namespaceSelector:
matchNames:
- kube-system

Google Kubernetes Ingress health check always failing

I have configured a web application pod exposed via apache on port 80. I'm unable to configure a service + ingress for accessing from the internet. The issue is that the backend services always report as UNHEALTHY.
Pod Config:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
name: webapp
name: webapp
namespace: my-app
spec:
replicas: 1
selector:
matchLabels:
name: webapp
template:
metadata:
labels:
name: webapp
spec:
containers:
- image: asia.gcr.io/my-app/my-app:latest
name: webapp
ports:
- containerPort: 80
name: http-server
Service Config:
apiVersion: v1
kind: Service
metadata:
name: webapp-service
spec:
type: NodePort
selector:
name: webapp
ports:
- protocol: TCP
port: 50000
targetPort: 80
Ingress Config:
kind: Ingress
metadata:
name: webapp-ingress
spec:
backend:
serviceName: webapp-service
servicePort: 50000
This results in backend services reporting as UNHEALTHY.
The health check settings:
Path: /
Protocol: HTTP
Port: 32463
Proxy protocol: NONE
Additional information: I've tried a different approach of exposing the deployment as a load balancer with external IP and that works perfectly. When trying to use a NodePort + Ingress, this issue persists.
With GKE, the health check on the Load balancer is created automatically when you create the ingress. Since the HC is created automatically, so are the firewall rules.
Since you have no readinessProbe configured, the LB has a default HC created (the one you listed). To debug this properly, you need to isolate where the point of failure is.
First, make sure your pod is serving traffic properly;
kubectl exec [pod_name] -- wget localhost:80
If the application has curl built in, you can use that instead of wget.
If the application has neither wget or curl, skip to the next step.
get the following output and keep track of the output:
kubectl get po -l name=webapp -o wide
kubectl get svc webapp-service
You need to keep the service and pod clusterIPs
SSH to a node in your cluster and run sudo toolbox bash
Install curl:
apt-get install curl`
Test the pods to make sure they are serving traffic within the cluster:
curl -I [pod_clusterIP]:80
This needs to return a 200 response
Test the service:
curl -I [service_clusterIP]:80
If the pod is not returning a 200 response, the container is either not working correctly or the port is not open on the pod.
if the pod is working but the service is not, there is an issue with the routes in your iptables which is managed by kube-proxy and would be an issue with the cluster.
Finally, if both the pod and the service are working, there is an issue with the Load balancer health checks and also an issue that Google needs to investigate.
As Patrick mentioned, the checks will be created automatically by GCP.
By default, GKE will use readinessProbe.httpGet.path for the health check.
But if there is no readinessProbe configured, then it will just use the root path /, which must return an HTTP 200 (OK) response (and that's not always the case, for example, if the app redirects to another path, then the GCP health check will fail).