Kubernetes metrics-server not working with Linkerd - kubernetes

I have a metrics-server and a horizontal pod autoscaler using this server, running on my cluster.
This works perfectly fine, until i inject linkerd-proxies into the deployments of the namespace where my application is running. Running kubectl top pod in that namespace results in a error: Metrics not available for pod <name> error. However, nothing appears in the metrics-server pod's logs.
The metrics-server clearly works fine in other namespaces, because top works in every namespace but the meshed one.
At first i thought it could be because the proxies' resource requests/limits weren't set, but after running the injection with them (kubectl get -n <namespace> deploy -o yaml | linkerd inject - --proxy-cpu-request "10m" --proxy-cpu-limit "1" --proxy-memory-request "64Mi" --proxy-memory-limit "256Mi" | kubectl apply -f -), the issue stays the same.
Is this a known problem, are there any possible solutions?
PS: I have a kube-prometheus-stack running in a different namespace, and this seems to be able to scrape the pod metrics from the meshed pods just fine

The problem was apparently a bug in the cAdvisor stats provider with the CRI runtime. The linkerd-init containers keep producing metrics after they've terminated, which shouldn't happen. The metrics-server ignores stats from pods that contain containers that report zero values (to avoid reporting invalid metrics, like when a container is restarting, metrics aren't collected yet,...). You can follow up on the issue here. Solutions seem to be changing to another runtime or using the PodAndContainerStatsFromCRI flag, which will let the internal CRI stats provider be responsible instead of the cAdvisor one.

I'm able to use kubectl top on pods that have linkerd injected:
:; kubectl top pod -n linkerd --containers
POD NAME CPU(cores) MEMORY(bytes)
linkerd-destination-5cfbd7468-7l22t destination 2m 41Mi
linkerd-destination-5cfbd7468-7l22t linkerd-proxy 1m 13Mi
linkerd-destination-5cfbd7468-7l22t policy 1m 81Mi
linkerd-destination-5cfbd7468-7l22t sp-validator 1m 34Mi
linkerd-identity-fc9bb697-s6dxw identity 1m 33Mi
linkerd-identity-fc9bb697-s6dxw linkerd-proxy 1m 12Mi
linkerd-proxy-injector-668455b959-rlvkj linkerd-proxy 1m 13Mi
linkerd-proxy-injector-668455b959-rlvkj proxy-injector 1m 40Mi
So I don't think there's anything fundamentally incompatible with the Linkerd and the Kubernetes metrics server.
I have noticed that I will sometimes see the errors for the first ~1m after a pod starts, before the metrics server has gotten its initial state for a pod; but these error messages seem a little different than what you reference:
:; kubectl rollout restart -n linkerd deployment linkerd-destination
deployment.apps/linkerd-destination restarted
:; while ! kubectl top pod -n linkerd --containers linkerd-destination-6d974dd4c7-vw7nw ; do sleep 10 ; done
Error from server (NotFound): podmetrics.metrics.k8s.io "linkerd/linkerd-destination-6d974dd4c7-vw7nw" not found
Error from server (NotFound): podmetrics.metrics.k8s.io "linkerd/linkerd-destination-6d974dd4c7-vw7nw" not found
Error from server (NotFound): podmetrics.metrics.k8s.io "linkerd/linkerd-destination-6d974dd4c7-vw7nw" not found
Error from server (NotFound): podmetrics.metrics.k8s.io "linkerd/linkerd-destination-6d974dd4c7-vw7nw" not found
POD NAME CPU(cores) MEMORY(bytes)
linkerd-destination-6d974dd4c7-vw7nw destination 1m 25Mi
linkerd-destination-6d974dd4c7-vw7nw linkerd-proxy 1m 13Mi
linkerd-destination-6d974dd4c7-vw7nw policy 1m 18Mi
linkerd-destination-6d974dd4c7-vw7nw sp-validator 1m 19Mi
:; kubectl version --short
Client Version: v1.23.3
Server Version: v1.21.7+k3s1

Related

CockroachDB distributed workload on all nodes

I've deployed a CockroachDB cluster on Kubernetes using this guide:
https://github.com/cockroachlabs-field/kubernetes-examples/blob/master/SECURE.md
I deployed it with
$ helm install k8crdb --set Secure.Enabled=true cockroachdb/cockroachdb --namespace=thesis-crdb
Here is how it looks when I list it with $ helm list --namespace=thesis-crdb
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
k8crdb thesis-crdb 1 2021-01-29 20:18:25.5710691 +0100 CET deployed cockroachdb-5.0.4 20.2.4
Here is how it looks when I list it with $ kubectl get all --namespace=thesis-crdb
NAME READY STATUS RESTARTS AGE
pod/k8crdb-cockroachdb-0 1/1 Running 0 3h1m
pod/k8crdb-cockroachdb-1 1/1 Running 0 3h1m
pod/k8crdb-cockroachdb-2 1/1 Running 0 3h1m
pod/k8crdb-cockroachdb-init-j2h7t 0/1 Completed 0 3h1m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/k8crdb-cockroachdb ClusterIP None <none> 26257/TCP,8080/TCP 3h1m
service/k8crdb-cockroachdb-public ClusterIP 10.99.163.201 <none> 26257/TCP,8080/TCP 3h1m
NAME READY AGE
statefulset.apps/k8crdb-cockroachdb 3/3 3h1m
NAME COMPLETIONS DURATION AGE
job.batch/k8crdb-cockroachdb-init 1/1 33s 3h1m
Now I wanna simulate traffic to this cluster. First I access the pod with: $ kubectl exec -i -t -n thesis-crdb k8crdb-cockroachdb-0 -c db "--" sh -c "clear; (bash || ash || sh)"
Which gets me inside the first pod/node.
From here I initiate the workload
[root#k8crdb-cockroachdb-0 cockroach]# cockroach workload init movr 'postgresql://root#localhost:26257?sslmode=disable'
And then I run the workload for 5 minutes
[root#k8crdb-cockroachdb-0 cockroach]# cockroach workload run movr --duration=5m 'postgresql://root#localhost:26257?sslmode=disable'
I am aware that I'm running the workload on one node, but I was under the expression that the workload would be distributed among all nodes? Because when I monitor the performance with the cockroachDB console I see that it's only the first node that is doing all the work, and the other nodes are idle.
As you can see the second (and third node) haven't had any workload at all. Is this just a visual glitch in the console? Or how can I run the workload so it get distributed evenly among all nodes in the cluster?
-UPDATE-
Yes, glad you brought up the cockroachdb-client-secure pod, because that's where I no longer could follow the guide. I tried as they did in the guide by doing: $ curl https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/client-secure.yaml | sed -e 's/serviceAccountName\: cockroachdb/serviceAccountName\: k8crdb-cockroachdb/g' | kubectl create -f -
But it throws this error:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1638 100 1638 0 0 4136 0 --:--:-- --:--:-- --:--:-- 4146
Error from server (Forbidden): error when creating "STDIN": pods "cockroachdb-client-secure" is forbidden: error looking up service account default/k8crdb-cockroachdb: serviceaccount "k8crdb-cockroachdb" not found
I also don't know if my certificates have been approved, because when I try this:
$ kubectl get csr k8crdb-cockroachdb-0 --namespace=thesis-crdb
I throws this:
Error from server (NotFound): certificatesigningrequests.certificates.k8s.io "k8crdb-cockroachdb-0" not found
And when I try to approve certificate: $ kubectl certificate approve k8crdb-cockroachdb-0 --namespace=thesis-crdb
It throws:
Error from server (NotFound): certificatesigningrequests.certificates.k8s.io "k8crdb-cockroachdb-0" not found
Any idea how to proceed from here?
This is not a glitch. Nodes will only receive SQL traffic if clients connect to them and issue SQL statements. It seems like you're running the workload by logging in to one of the cockroach pods and directing it to connect to that pod on its local port. That means only that pod is going to receive queries. The cockroach workload subcommand takes an arbitrary number of pgurl strings and will balance load over all of them. Note also that k8crdb-cockroachdb-public represents a load-balancer over all o
If you look at the guide you posted, it continues to describe how to deploy the cockroachdb-client-secure pod. Th If you were to run the workload there pointed at the load balancer, with something like:
'postgres://root#k8crdb-cockroachdb-public?sslcert=cockroach-certs%2Fclient.root.crt&sslkey=cockroach-certs%2Fclient.root.key&sslrootcert=cockroach-certs%2Fca.crt&sslmode=verify-full'
UPDATE
I'm not an expert in the k8s here but I think your issue creating the client pod relates to the namespace. It's currently assuming that everything is in the default namespace but it appears that you're working in the --namespace=thesis-crdb. Consider adding a namespace flag to the kubectl create -f - command. Or, potentially consider setting the namespace for the session:
kubectl config set-context --current --namespace=thesis-crdb

Unable able to see Pods CPU and Memory Utilization and graphs are missing Kubernetes dashboard

K8s VERSION = v1.18.6
I have deployed the Kubernetes dashboard using the following command and added a privileged user with which I logged into the dashboard.
but not able to see Pods CPU and Memory Utilization graphs are missing Kubernetes dashboard
The Kubernetes Metrics Server is an aggregator of resource usage data in your cluster,
To deploy the Metrics Server
Deploy the Metrics Server with the following command:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml
Verify that the metrics-server deployment is running the desired number of pods with the following command.
kubectl get deployment metrics-server -n kube-system
Output
NAME READY UP-TO-DATE AVAILABLE AGE
metrics-server 1/1 1 1 6m
Also you can validate by below command:
kubectl top nodes
to see node cpu utilisation if it works, it should then come up in Dashboard as well.
Resource usage metrics are only available for K8s clusters once Metrics Server has been installed.

How to debug why my pods are pending in GCE

I'#m trying to get a pod running on GCE. The pod has an init container, and is created by me applying a manifest with a deployment that creates 1 replica of the pod.
When I look at my workloads on the cloud console, I can see that under 'Active revisions' my deployment is in the state of 'Pods are pending', and under 'Managed pods', the status is 'PodsInitializing'.
The container logs are empty, and the audit logs contain a single entry for the creation of the deployment.
My pods seem to be stuck in the above state, and I'm not really sure why. How do I go about debugging that?
Edit:
kubectl get pods --namespace=my-namespace
Outputs:
NAME READY STATUS RESTARTS AGE
my-pod-v77jm 0/1 Init:0/1 0 55m
But when I run:
kubectl describe pod my-pod-v77jm
I get
Error from server (NotFound): pods "my-pod-v77jm" not found
If you have access to kube-api via kubectl:
Use describe see details about the pod and containers
kubectl describe myPod --namespace mynamespace
To view container logs (including init containers)
kubectl logs myPod --namespace mynamespace -c initContainerName
You can get more information about pod statuses and how to debug init containers here

Kubernates autoscaling. ScalingActive False

Trying to add autoscaling to my deployment,but getting ScalingActive False,most answers are about DNS,Heapster,Limits I've done all but still can't find solution.
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
fetch Deployment/fetch <unknown>/50% 1 4 1 13m
kubectl cluster-info
Kubernetes master is running at --
addon-http-application-routing-default-http-backend is running at --
addon-http-application-routing-nginx-ingress is running at --
Heapster is running at --
KubeDNS is running at --
kubernetes-dashboard is running at --
kubectl describe hpa`
yaml `
PS.I tried to deploy example witch azure provides....getting the same,so yaml settings isn't problem
kubectl describe pod `
kubectl top pod fetch-54f697989d-wczvn --namespace=default`
autoscaling by memory yaml `
description`
kubectl get hpa give the same result,unknown/60%
I've experienced similar issues, my solutions are setting resources.requests.cpu section up in deployment config in order to calculate the current percentage based on the requested resource values. Your event log messages also means not to set up the request resource, but your deployment yaml seems no problem to me too.
Let we do double check as following steps.
If you can verify the resources as following cmd,
# kubectl top pod <your pod name> --namespace=<your pod running namespace>
And you would also need to check the pod requested cpu resources using below cmd in order to ensure same the config with your deployment yaml.
# kubectl describe pod <your pod name>
...
Requests:
cpu: 250m
...
I hope it help you to resolve your issues. ;)
This one helped me github issue. I just deployed metric server to my cluster and recreated hpa.

Minikube got stuck when creating container

I recently got started to learn Kubernetes by using Minikube locally in my Mac. Previously, I was able to start a local Kubernetes cluster with Minikube 0.10.0, created a deployment and viewed Kubernetes dashboard.
Yesterday I tried to delete the cluster and re-did everything from scratch. However, I found I cannot get the assets deployed and cannot view the dashboard. From what I saw, everything seemed to get stuck during container creation.
After I ran minikube start, it reported
Starting local Kubernetes cluster...
Kubectl is now configured to use the cluster.
When I ran kubectl get pods --all-namespaces, it reported (pay attention to the STATUS column):
kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system kube-addon-manager-minikube 0/1 ContainerCreating 0 51s
docker ps showed nothing:
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
minikube status tells me the VM and cluster are running:
minikubeVM: Running
localkube: Running
If I tried to create a deployment and an autoscaler, I was told they were created successfully:
kubectl create -f configs
deployment "hello-minikube" created
horizontalpodautoscaler "hello-minikube-autoscaler" created
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default hello-minikube-661011369-1pgey 0/1 ContainerCreating 0 1m
default hello-minikube-661011369-91iyw 0/1 ContainerCreating 0 1m
kube-system kube-addon-manager-minikube 0/1 ContainerCreating 0 21m
When exposing the service, it said:
$ kubectl expose deployment hello-minikube --type=NodePort
service "hello-minikube" exposed
$ kubectl get service
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hello-minikube 10.0.0.32 <nodes> 8080/TCP 6s
kubernetes 10.0.0.1 <none> 443/TCP 22m
When I tried to access the service, I was told:
curl $(minikube service hello-minikube --url)
Waiting, endpoint for service is not ready yet...
docker ps still showed nothing. It looked to me everything got stuck when creating a container. I tried some other ways to work around this issue:
Upgraded to minikube 0.11.0
Use the xhyve driver instead of the Virtualbox driver
Delete everything cached, like ~/.minikube, ~/.kube, and the cluster, and re-try
None of them worked for me.
Kubernetes is still new to me and I would like to know:
How can I troubleshoot this kind of issue?
What could be the cause of this issue?
Any help is appreciated. Thanks.
It turned out to be a network problem in my case.
The pod status is "ContainerCreating", and I found during container creation, docker image will be pulled from gcr.io, which is inaccessible in China (blocked by GFW). Previous time it worked for me because I happened to connect to it via a VPN.
I didn't try minikube but I use kubernetes. With the information provided it is difficult to say the cause of the issue. Your minikube has no problem in creating resources but ContainerCreating is a problem related to docker daemon or improper communication between kube-api and docker daemon or some problem with kubelet.
You can try the following command:
kubectl describe po POD_NAME
This will give you the POD's events. Maybe this will provide a path to the root cause of issue.
You may also check the logs of kubelet to get the events.
I had this problem on Windows, but it was related to an NTLM proxy. I deleted the minikube VM then recreated it with the correct proxy settings for my CNTLM installation:
minikube start \
--docker-env http_proxy=http://10.0.2.2:3128 \
--docker-env https_proxy=http://10.0.2.2:3128 \
--docker-env no_proxy=localhost,127.0.0.1,::1,192.168.99.100
See https://blog.alexellis.io/minikube-behind-proxy/
The horizontalpodautoscaler (hpa) requires heapster to use. You'll need to run heapster in minikube for that to work. You can always debug these kinds of issues with minikube logs or interactively through the dashboard found at minikube dashboard.
You can find the steps to run heapster and grafana at https://github.com/kubernetes/heapster
For me, it takes several minutes before I see the ContainerCreating problem. After executing the following command:
systemctl status kube-controller-manager.service
I get this error:
Sync "default/redis-master-2229813293" failed with unable to create pods: No API token found for service account "default", retry after the token is automatically created and added to the service account.
There are two ways to solve this:
Set the service account with token
Remove the ServiceAccount setting of KUBE_ADMISSION_CONTROL in api-server