Unable to fully collect metrics, when installing metric-server

Unable to fully collect metrics, when installing metric-server - kubernetes

I have installed the metric server on kubernetes, but its not working and logs
unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:xxx: unable to fetch metrics from Kubelet ... (X.X): Get https:....: x509: cannot validate certificate for 1x.x.
x509: certificate signed by unknown authority
I was able to get metrics if modified the deployment yaml and added
command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
this now collects metrics, and kubectl top node returns results...
but logs still show
E1120 11:58:45.624974 1 reststorage.go:144] unable to fetch pod metrics for pod dev/pod-6bffbb9769-6z6qz: no metrics known for pod
E1120 11:58:45.625289 1 reststorage.go:144] unable to fetch pod metrics for pod dev/pod-6bffbb9769-rzvfj: no metrics known for pod
E1120 12:00:06.462505 1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:ip-1x.x.x.eu-west-1.compute.internal: unable to get CPU for container ...discarding data: missing cpu usage metric, unable to fully scrape metrics from source
so questions
1) All this works on minikube, but not on my dev cluster, why would that be?
2) In production i dont want to do insecure-tls.. so can someone please explain why this issue is arising... or point me to some resource.

Kubeadm generates the kubelet certificate at /var/lib/kubelet/pki and those certificates (kubelet.crt and kubelet.key) are signed by different CA from the one which is used to generate all other certificates at /etc/kubelet/pki.
You need to regenerate the kubelet certificates which is signed by your root CA (/etc/kubernetes/pki/ca.crt)
You can use openssl or cfssl to generate the new certificates(I am using cfssl)
$ mkdir certs; cd certs
$ cp /etc/kubernetes/pki/ca.crt ca.pem
$ cp /etc/kubernetes/pki/ca.key ca-key.pem
Create a file kubelet-csr.json:
{
"CN": "kubernetes",
"hosts": [
"127.0.0.1",
"<node_name>",
"kubernetes",
"kubernetes.default",
"kubernetes.default.svc",
"kubernetes.default.svc.cluster",
"kubernetes.default.svc.cluster.local"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [{
"C": "US",
"ST": "NY",
"L": "City",
"O": "Org",
"OU": "Unit"
}]
}
Create a ca-config.json file:
{
"signing": {
"default": {
"expiry": "8760h"
},
"profiles": {
"kubernetes": {
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
],
"expiry": "8760h"
}
}
}
}
Now generate the new certificates using above files:
$ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem \
--config=ca-config.json -profile=kubernetes \
kubelet-csr.json | cfssljson -bare kubelet
Replace the old certificates with newly generated one:
$ scp kubelet.pem <nodeip>:/var/lib/kubelet/pki/kubelet.crt
$ scp kubelet-key.pem <nodeip>:/var/lib/kubelet/pki/kubelet.key
Now restart the kubelet so that new certificates will take effect on your node.
$ systemctl restart kubelet
Look at the following tickets to get the context of issue:
https://github.com/kubernetes-incubator/metrics-server/issues/146
Hope this helps.

Related

Istio Pods Not Coming Up

I have Installed Istio-1.8.3 via Rancher UI long back and Istio Pods and
Ingress Gateway Pods are Up and Running and My Application is getting served by Istio.
Now, Recently We have Upgraded the K8's Cluster version from 1.21 to 1.22 and then 1.22 to 1.23.
Once We restart the kubelet, Istio Pods used to come up with No Issues.
Now, Because of few Issues We have rebooted the Node and Istio Pods got restarted, They are in Running State but Readiness Probe is getting failed.
The Error I was able to find is
failed to list CRDs: the server could not find the requested resource
Below are the full logs of Istio Pod.
stream logs failed container "discovery" in pod "istiod-5fbc9568cd-qgqkk" is waiting to start: ContainerCreating for istio-system/istiod-5fbc9568cd-qgqkk (discovery)
2022-06-27T05:35:32.772949Z info FLAG: --log_rotate_max_age="30"
2022-06-27T05:35:32.772952Z info FLAG: --log_rotate_max_backups="1000"
2022-06-27T05:35:32.772955Z info FLAG: --log_rotate_max_size="104857600"
2022-06-27T05:35:32.772958Z info FLAG: --log_stacktrace_level="default:none"
2022-06-27T05:35:32.772963Z info FLAG: --log_target="[stdout]"
2022-06-27T05:35:32.772971Z info FLAG: --mcpInitialConnWindowSize="1048576"
2022-06-27T05:35:32.772974Z info FLAG: --mcpInitialWindowSize="1048576"
2022-06-27T05:35:32.772977Z info FLAG: --mcpMaxMsgSize="4194304"
2022-06-27T05:35:32.772980Z info FLAG: --meshConfig="./etc/istio/config/mesh"
2022-06-27T05:35:32.772982Z info FLAG: --monitoringAddr=":15014"
2022-06-27T05:35:32.772985Z info FLAG: --namespace="istio-system"
2022-06-27T05:35:32.772988Z info FLAG: --networksConfig="/etc/istio/config/meshNetworks"
2022-06-27T05:35:32.772999Z info FLAG: --plugins="[authn,authz,health]"
2022-06-27T05:35:32.773002Z info FLAG: --profile="true"
2022-06-27T05:35:32.773005Z info FLAG: --registries="[Kubernetes]"
2022-06-27T05:35:32.773008Z info FLAG: --resync="1m0s"
2022-06-27T05:35:32.773011Z info FLAG: --secureGRPCAddr=":15012"
2022-06-27T05:35:32.773013Z info FLAG: --tlsCertFile=""
2022-06-27T05:35:32.773016Z info FLAG: --tlsKeyFile=""
2022-06-27T05:35:32.773018Z info FLAG: --trust-domain=""
2022-06-27T05:35:32.801976Z info klog Config not found: /var/run/secrets/remote/config[]
2022-06-27T05:35:32.803516Z info initializing mesh configuration ./etc/istio/config/mesh
2022-06-27T05:35:32.804499Z info mesh configuration: {
"proxyListenPort": 15001,
"connectTimeout": "10s",
"protocolDetectionTimeout": "0s",
"ingressClass": "istio",
"ingressService": "istio-ingressgateway",
"ingressControllerMode": "STRICT",
"enableTracing": true,
"defaultConfig": {
"configPath": "./etc/istio/proxy",
"binaryPath": "/usr/local/bin/envoy",
"serviceCluster": "istio-proxy",
"drainDuration": "45s",
"parentShutdownDuration": "60s",
"discoveryAddress": "istiod.istio-system.svc:15012",
"proxyAdminPort": 15000,
"controlPlaneAuthPolicy": "MUTUAL_TLS",
"statNameLength": 189,
"concurrency": 2,
"tracing": {
"zipkin": {
"address": "zipkin.istio-system:9411"
}
},
"envoyAccessLogService": {
},
"envoyMetricsService": {
},
"proxyMetadata": {
"DNS_AGENT": ""
},
"statusPort": 15020,
"terminationDrainDuration": "5s"
},
"outboundTrafficPolicy": {
"mode": "ALLOW_ANY"
},
"enableAutoMtls": true,
"trustDomain": "cluster.local",
"trustDomainAliases": [
],
"defaultServiceExportTo": [
"*"
],
"defaultVirtualServiceExportTo": [
"*"
],
"defaultDestinationRuleExportTo": [
"*"
],
"rootNamespace": "istio-system",
"localityLbSetting": {
"enabled": true
},
"dnsRefreshRate": "5s",
"certificates": [
],
"thriftConfig": {
},
"serviceSettings": [
],
"enablePrometheusMerge": true
}
2022-06-27T05:35:32.804516Z info version: 1.8.3-e282a1f927086cc046b967f0171840e238a9aa8c-Clean
2022-06-27T05:35:32.804699Z info flags:
2022-06-27T05:35:32.804706Z info initializing mesh networks
2022-06-27T05:35:32.804877Z info mesh networks configuration: {
"networks": {
}
}
2022-06-27T05:35:32.804938Z info initializing mesh handlers
2022-06-27T05:35:32.804949Z info initializing controllers
2022-06-27T05:35:32.804952Z info No certificates specified, skipping K8S DNS certificate controller
2022-06-27T05:35:32.814002Z error kube failed to list CRDs: the server could not find the requested resource
2022-06-27T05:35:33.816596Z error kube failed to list CRDs: the server could not find the requested resource
2022-06-27T05:35:35.819157Z error kube failed to list CRDs: the server could not find the requested resource
2022-06-27T05:35:39.821510Z error kube failed to list CRDs: the server could not find the requested resource
2022-06-27T05:35:47.823675Z error kube failed to list CRDs: the server could not find the requested resource
2022-06-27T05:36:03.827023Z error kube failed to list CRDs: the server could not find the requested resource
2022-06-27T05:36:35.829441Z error kube failed to list CRDs: the server could not find the requested resource
2022-06-27T05:37:35.831758Z error kube failed to list CRDs: the server could not find the requested resource

Upgrading Istio Pilot and Istio Ingress Gateway from 1.8.3 to 1.10.2 will work.
https://github.com/istio/istio/issues/34665

Istio version 1.8.x is too old version for kubernetes 1.23. You can refer istio documentation for k8s and istio combinations support and upgrade istio

How to pass a flag to klog for structured logging

As part of kubernetes 1.19, structured logging has been implemented.
I've read that kubernetes log's engine is klog and structured logs are following this format :
<klog header> "<message>" <key1>="<value1>" <key2>="<value2>" ...
Cool ! But even better, you apparently can pass a --logging-format=json flag to klog so logs are generated in json directly !
{
"ts": 1580306777.04728,
"v": 4,
"msg": "Pod status updated",
"pod":{
"name": "nginx-1",
"namespace": "default"
},
"status": "ready"
}
Unfortunately, I haven't been able to find out how and where I should specify that --logging-format=json flag.
Is it a kubectl command? I'm using Azure's aks.

--logging-format=json is a flag which need to be set on all Kuberentes System Components ( Kubelet, API-Server, Controller-Manager & Scheduler). You can check all flags here.
Unfortunately you cant do it right now with AKS as you have the managed control plane from Microsoft.

Unable to retrieve custom metrics from prometheus-adapter

i am trying to experiment with scaling one of my application pods running on my raspberry pi kubernetes cluster using HPA + custom metrics but ran into several issues which despite reading the documentations on https://github.com/DirectXMan12/k8s-prometheus-adapter and troubleshooting for the past 2 days, i am still having difficulties grasping why some problems are happening.
Firstly, i built an ARM-compatible image of k8s-prometheus-adapter and install it using helm. I can confirm its running properly by checking the pod logs.
I have also set up a script which sends raspberry pis temperature to pushgateway and i can query via this Prometheus query node_temp, which will return the following series
node_temp{job="kube4"} 42
node_temp{job="kube1"} 44
node_temp{job="kube2"} 39
node_temp{job="kube3"} 40
Now i want to be able to scale one of my application pods using the above temperature values as an experiment to understand better how it works.
Below is my k8s-prometheus-adapter helm values.yml file
image:
repository: jaanhio/k8s-prometheus-adapter-arm
tag: latest
logLevel: 7
prometheus:
url: http://10.17.0.12
rules:
default: false
custom:
- seriesQuery: 'etcd_object_counts'
resources:
template: <<.Resource>>
name:
as: "etcd_object"
metricsQuery: count(etcd_object_counts)
- seriesQuery: 'node_temp'
resources:
template: <<.Resource>>
name:
as: "node_temp"
metricsQuery: count(node_temp)
After installing via helm, i ran kubectl get apiservices and can see v1beta1.custom.metrics.k8s.io listed.
i then ran kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq and got the following
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "custom.metrics.k8s.io/v1beta1",
"resources": [
{
"name": "jobs.batch/node_temp",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
"get"
]
},
{
"name": "jobs.batch/etcd_object",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
"get"
]
},
]
i then tried to query the value of the registered node_temp metrics using kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/jobs/*/node_temp but got the following response
Error from server (InternalError): Internal error occurred: unable to list matching resources
Questions:
Why is the node_temp metrics associated with jobs.batch resource type?
Why am i not able to retrieve the value of metrics via kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/jobs/*/node_temp?
What is a definitive way of figuring the path of the query? e.g /apis/custom.metrics.k8s.io/v1beta1/jobs/*/node_temp i kinda trial and error until i got see somewhat of a response. i also see some other path with namespaces in the query e.g /apis/custom.metrics.k8s.io/v1beta1/namespaces/*/metrics/foo_metrics
Any help and advice will be greatly appreciate!

Why is the node_temp metrics associated with jobs.batch resource type?
It picks the labels attached to the prometheus metrics and tries to interpret them, in this case u have clearely "job-kube4"
Why am i not able to retrieve the value of metrics via kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/jobs/*/node_temp?
Metrics are namespaced, see the "namespaced:true" so you'll need "/apis/custom.metrics.k8s.io/v1beta1/namespaces//jobs//node_temp"
What is a definitive way of figuring the path of the query? e.g /apis/custom.metrics.k8s.io/v1beta1/jobs//node_temp i kinda trial and error until i got see somewhat of a response. i also see some other path with namespaces in the query e.g /apis/custom.metrics.k8s.io/v1beta1/namespaces//metrics/foo_metrics
Check https://github.com/kubernetes/community/blob/master/contributors/design-proposals/instrumentation/custom-metrics-api.md#api-paths

Unable to access vault ui while running vault in docker: 404 page not found

I am running vault in docker like:
$ docker run -it --rm -p 8200:8200 vault:0.9.1
I have unsealed the vault:
$ VAULT_ADDR=http://localhost:8200 VAULT_SKIP_VERIFY="true" vault operator unseal L6M8O7Xg7c8vBe3g35s25OWeruNDfaQzQ5g9UZ2bvGM=
Key Value
--- -----
Seal Type shamir
Initialized false
Sealed false
Total Shares 1
Threshold 1
Version 0.9.1
Cluster Name vault-cluster-52a8c4b5
Cluster ID 96ba7037-3c99-5b6e-272e-7bcd6e5cc45c
HA Enabled false
However, I can't access the UI http://localhost:8200/ui in firefox. The error is:
404 page not found
Do you know what I am doing wrong? Does the vault docker image in docker hub have UI compiled in it?

Web UI was opensourced in v0.10.0, so v0.9.1 doesn't have Web UI. Here is blog announcing release and CHANGELOG for v0.10.0 - take a look at FEATURES subsection.
To see Web UI in web browser try running this command:
$ docker run -it --rm -p 8200:8200 vault:0.10.0
However, I would suggest using more recent Vault version, as there have been many improvements and bug fixes in the meantime. Also features added in the Web UI, so if you follow latest documentation, some of the things described there might not be available in older versions.

I observed this behavior with the Vault 0.10.3
(https://releases.hashicorp.com/vault/0.10.3/vault_0.10.3_linux_amd64.zip)
when I put the
adjustment that enabled ui in the very bottom of the vault configuration file
(etc. config.json), so
Config that returns a 404 error looks like one below:
{
"listener": [{
"tcp": {
"address" : "0.0.0.0:8200",
"tls_disable" : 1
}
}],
"api_addr": "http://172.16.94.10:8200",
"storage": {
"consul" : {
"address" : "127.0.0.1:8500",
"path": "vault"
}
}
},
"max_lease_ttl": "10h",
"default_lease_ttl": "10h",
"ui":"true"
}
and one that works with Vault 0.10.3 has a ui in the very top of its configuration file:
{
"ui":"true",
"listener": [{
"tcp": {
"address" : "0.0.0.0:8200",
"tls_disable" : 1
}
}],
"api_addr": "http://172.16.94.10:8200",
"storage": {
"consul" : {
"address" : "127.0.0.1:8500",
"path": "vault"
}
}
},
"max_lease_ttl": "10h",
"default_lease_ttl": "10h"
}

Amazon EKS: generate/update kubeconfig via python script

When using Amazon's K8s offering, the EKS service, at some point you need to connect the Kubernetes API and configuration to the infrastructure established within AWS. Especially we need a kubeconfig with proper credentials and URLs to connect to the k8s control plane provided by EKS.
The Amazon commandline tool aws provides a routine for this task
aws eks update-kubeconfig --kubeconfig /path/to/kubecfg.yaml --name <EKS-cluster-name>
Question: do the same through Python/boto3
When looking at the Boto API documentation, I seem to be unable to spot the equivalent for the above mentioned aws routine. Maybe I am looking at the wrong place.
is there a ready-made function in boto to achieve this?
otherwise how would you approach this directly within python (other than calling out to aws in a subprocess)?

There isn't a method function to do this, but you can build the configuration file yourself like this:
# Set up the client
s = boto3.Session(region_name=region)
eks = s.client("eks")
# get cluster details
cluster = eks.describe_cluster(name=cluster_name)
cluster_cert = cluster["cluster"]["certificateAuthority"]["data"]
cluster_ep = cluster["cluster"]["endpoint"]
# build the cluster config hash
cluster_config = {
"apiVersion": "v1",
"kind": "Config",
"clusters": [
{
"cluster": {
"server": str(cluster_ep),
"certificate-authority-data": str(cluster_cert)
},
"name": "kubernetes"
}
],
"contexts": [
{
"context": {
"cluster": "kubernetes",
"user": "aws"
},
"name": "aws"
}
],
"current-context": "aws",
"preferences": {},
"users": [
{
"name": "aws",
"user": {
"exec": {
"apiVersion": "client.authentication.k8s.io/v1alpha1",
"command": "heptio-authenticator-aws",
"args": [
"token", "-i", cluster_name
]
}
}
}
]
}
# Write in YAML.
config_text=yaml.dump(cluster_config, default_flow_style=False)
open(config_file, "w").write(config_text)

This is explained in Create kubeconfig manually section of https://docs.aws.amazon.com/eks/latest/userguide/create-kubeconfig.html, which is in fact referenced from the boto3 EKS docs. The manual method there is very similar to #jaxxstorm's answer except that it doesn't shown the python code you would need, however it also does not assume heptio anthenticator (it shows token and IAM authenticator approaches).

I faced same problem decided to implement it as a Python package
it can be installed via
pip install eks-token
and then simply do
from eks_token import get_token
response = get_token(cluster_name='<value>')
More details and examples here

Amazon's aws tool is included in the python package awscli, so one option is to add awscli as a python dependency and just call it from python. The code below assumes that kubectl is installed (but you can remove the test if you want).
kubeconfig depends on ~/.aws/credentials
One challenge here is that the kubeconfig file generated by aws has a users section like this:
users:
- name: arn:aws:eks:someregion:1234:cluster/somecluster
user:
exec:
apiVersion: client.authentication.k8s.io/v1beta1
args:
- --region
- someregion
- eks
- get-token
- --cluster-name
- somecluster
command: aws
So if you you mount it into a container or move it to a different machine you'll get this error when you try to use it:
Unable to locate credentials. You can configure credentials by running "aws configure".
Based on that user section, kubectl is running aws eks get-token and it's failing because the ~/.aws dir doesn't have the credentials that it had when the kubeconfig file was generated.
You could get around this by also staging the ~/.aws dir everywhere you want to use the kubeconfig file, but I have automation that takes a lone kubeconfig file as a parameter, so I'll be modifying the user section to include the necessary secrets as env vars.
Be aware that this makes it possible for whoever gets that kubeconfig file to use the secrets we've included for other things. Whether this is a problem will depend on how much power your aws user has.
Assume Role
If your cluster uses RBAC, you might need to specify which role you want for your kubeconfig file. The code below does this by first generating a separate set of creds and then using them to generate the kubeconfig file.
Role assumption has a timeout (I'm using 12 hours below), so you'll need to call the script again if you can't manage your mischief before the token times out.
The Code
You can generate the file like:
pip install awscli boto3 pyyaml sh
python mkkube.py > kubeconfig
...if you put the following in mkkube.py
from pathlib import Path
from tempfile import TemporaryDirectory
from time import time
import boto3
import yaml
from sh import aws, sh
aws_access_key_id = "AKREDACTEDAT"
aws_secret_access_key = "ubREDACTEDaE"
role_arn = "arn:aws:iam::1234:role/some-role"
cluster_name = "mycluster"
region_name = "someregion"
# assume a role that has access
sts = boto3.client(
"sts",
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key,
)
assumed = sts.assume_role(
RoleArn=role_arn,
RoleSessionName="mysession-" + str(int(time())),
DurationSeconds=(12 * 60 * 60), # 12 hrs
)
# these will be different than the ones you started with
credentials = assumed["Credentials"]
access_key_id = credentials["AccessKeyId"]
secret_access_key = credentials["SecretAccessKey"]
session_token = credentials["SessionToken"]
# make sure our cluster actually exists
eks = boto3.client(
"eks",
aws_session_token=session_token,
aws_access_key_id=access_key_id,
aws_secret_access_key=secret_access_key,
region_name=region_name,
)
clusters = eks.list_clusters()["clusters"]
if cluster_name not in clusters:
raise RuntimeError(f"configured cluster: {cluster_name} not found among {clusters}")
with TemporaryDirectory() as kube:
kubeconfig_path = Path(kube) / "config"
# let awscli generate the kubeconfig
result = aws(
"eks",
"update-kubeconfig",
"--name",
cluster_name,
_env={
"AWS_ACCESS_KEY_ID": access_key_id,
"AWS_SECRET_ACCESS_KEY": secret_access_key,
"AWS_SESSION_TOKEN": session_token,
"AWS_DEFAULT_REGION": region_name,
"KUBECONFIG": str(kubeconfig_path),
},
)
# read the generated file
with open(kubeconfig_path, "r") as f:
kubeconfig_str = f.read()
kubeconfig = yaml.load(kubeconfig_str, Loader=yaml.SafeLoader)
# the generated kubeconfig assumes that upon use it will have access to
# `~/.aws/credentials`, but maybe this filesystem is ephemeral,
# so add the creds as env vars on the aws command in the kubeconfig
# so that even if the kubeconfig is separated from ~/.aws it is still
# useful
users = kubeconfig["users"]
for i in range(len(users)):
kubeconfig["users"][i]["user"]["exec"]["env"] = [
{"name": "AWS_ACCESS_KEY_ID", "value": access_key_id},
{"name": "AWS_SECRET_ACCESS_KEY", "value": secret_access_key},
{"name": "AWS_SESSION_TOKEN", "value": session_token},
]
# write the updates to disk
with open(kubeconfig_path, "w") as f:
f.write(yaml.dump(kubeconfig))
awsclipath = str(Path(sh("-c", "which aws").stdout.decode()).parent)
kubectlpath = str(Path(sh("-c", "which kubectl").stdout.decode()).parent)
pathval = f"{awsclipath}:{kubectlpath}"
# test the modified file without a ~/.aws/ dir
# this will throw an exception if we can't talk to the cluster
sh(
"-c",
"kubectl cluster-info",
_env={
"KUBECONFIG": str(kubeconfig_path),
"PATH": pathval,
"HOME": "/no/such/path",
},
)
print(yaml.dump(kubeconfig))