Can't do a kubectl get on the TokenReview kind - kubernetes

I'm searching for out of date apis in my k8s cluster but when I try to do kubectl get TokenReview --all-namespaces , it comes back with Error from server (MethodNotAllowed): the server does not allow this method on the requested resource
I was expecting a list of different yaml files of the kind "TokenReview" similar to the below
I'm running k8s 1.21 for server amd c.lient
anybody got any ideas?...not seeing anything in k8s docs

Apparently,.... "TokenReview create requests are not persisted, they are answered with ephemeral responses. This means the token review API does not support read requests (get, list, watch, etc) or update/delete/patch requests, only create."
"All of the _Review resources are like this as well, subjectaccessreview, selfsubjectaccessreview, tokenreview, etc, are all non-persisted resources."
I raised this on github and this is what they said

Related

kubectl command to GKE-Autopilot sometimes return forbidden error

env
GKE Autopilot v1.22.12-gke.2300
use kubectl command from ubuntu2004 VM
use gke-gcloud-auth-plugin
what happens
kubectl command sometimes return (Forbidden) error. e.g.)
kubectl get pod
Error from server (Forbidden): pods is forbidden: User "my-mail#domain.com" cannot list resource "pods" in API group "" in the namespace "default": GKEAutopilot authz: the request was sent before policy enforcement is enabled
It happens not always, so it must not be IAM problem. (it happens about 40%).
Before, I thinks it was GKE Autopilot v1.21.xxxx, this error didn't happen; at least not such frequently.
I couldn't find any helpful info even if I searched "GKEAutopilot authz", or "the request was sent before policy enforcement is enabled"
I wish if someone who faced to same issue has any idea.
Thank you in advance
I asked google cloud support.
They said it's bug on GKE master, and was fixed by them.
This problem doesn't happen anymore

GKE Metadata server errors

I have a GKE with Workload identity enabled.
Most of our workloads use Cloud Storage or Cloud logging GCP packages which means actually using the Workload identity for GCP access.
Recently we’ve started adding Secret Manager to the stack and started encountering random errors for the Metadata Server on workload startup. It happens on different frameworks.
Python:
File "/venv/lib/python3.8/site-packages/google/auth/compute_engine/credentials.py", line 117, in refresh six.raise_from(new_exc, caught_exc) File "<string>", line 3, in raise_from google.auth.exceptions.RefreshError: ("Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Enginemetadata service. Status: 404 Response:\nb'Not Found\\n'", <google.auth.transport.requests._Response object at 0x7f3a3084dd60>)
NodeJS:
failed to initialize. exiting. Error: 16 UNAUTHENTICATED: Failed to retrieve auth metadata with error: Could not refresh access token: network timeout at: http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform at Object
I’m trying to understand why it's happening.
First, 404 Not Found means we are trying to get metadata which does not exist/deleted. The thing is it recovers a few seconds later so I'm not sure how exactly.
Based on documentation, sometimes it takes some time for the metadata server to be available, and hence the error which ‘recover’ afterwards. So recommendation is to add delays on the app code or using init Containers until the Metadata server is operated.
I wonder if that's really the best approach, to add an init container to all of our workloads, and if it's really our use case as the error code is a bit misleading. Also, not quite sure why its only started when adding the secret manager.
This sometimes happens due to OOM issues on Metadata server. you can check status of the pod running metadata server using:
kubectl -n kube-system describe pods <pod_name>
you can get the pod_name using:
kubectl get pods --namespace kube-system .
the pod name will start with a prefix gke-metadata-server-
if you see something like following in output when you describe the pod:
Last State: Terminated
Reason: OOMKilled
then that would indicate OOM issue.
Some mitigations that you can try:
check if you have un-used ServiceAccounts in your cluster and if you can remove em.
check if you are creating too many clients (new one for every API
request). sharing clients if possible will reduce token refresh calls to Metadata server thus, saving memory.
check if you can find metadata server's definition under /etc/kubernetes/addons/. if you can, update the memory to increase it and apply the updated config.

GKE pod replica count in cluster

How can we obtain the gke pod counts running in the cluster? I found there are ways to get node count but we needed pod count as well. it will be better if we can use something with no logging needed in gcp operations.
You can do it with Kubernetes Python Client library as shown in this question, posted by Pradeep Padmanaban C, where he was looking for more effective way of doing it, but his example is actually the best what you can do to perform such operation as there is no specific method which would allow you just to count pods without retrieving their entire json manifests:
from kubernetes import client , config
config.load_kube_config()
v1= client.CoreV1Api()
ret_pod = v1.list_pod_for_all_namespaces(watch=False)
print(len(ret_pod.items))
You can also use a different method, which allows to retrieve pods only from specific namespace e.g.:
list_namespaced_pod("default")
In kubectl way you can do it as follows (as proposed here by RammusXu):
kubectl get pods --all-namespaces --no-headers | wc -l
You can directly access the kubernetes API using a restful API call. You will need to make sure you provide the authentication token in your call by including a bearer token.
Once you are able to query the api server directly, you can use GET <master_endpoint>/api/v1/pods to list all the pods in the cluster. You can also search for specific namespaces by specifying the namespace /api/v1/namespaces/<namespace>/pods.
Keep in mind that the kubectl cli tool is just a wrapper for API calls, each kubectl command will form a RESTful API call in a similar format to the one listed above, so any interaction you have with the cluster using kubectl can also be achieved through RESTful API calls

What does the 'UPDATE' Kubernetes RBAC permission do?

I cannot for the life of me find a detailed table of what all the Kubernetes RBAC verbs do. The only resource I see people recommending is this one, which is woefully inadequate.
So I've been working it out by experimentation.
Most are fairly straightforward so far, except for UPDATE. This does not seem to be able to do anything I would expect it to.
Permissions I gave my alias:
[GET, UPDATE] on [deployments] in default namespace.
Things I've tried:
kubectl set image deployment/hello-node echoserver=digitalocean/flask-helloworld --as user
kubectl edit deploy hello-node --as user
kubectl apply -f hello-node.yaml --as eks-user
These all failed with error: deployments.apps "hello-node" is forbidden: User "user" cannot patch resource "deployments" in API group "apps" in the namespace "default"
I then tried some rollout commands like:
k rollout undo deploy hello-node --as user
But they failed because I didn't have replica-set access.
TLDR: What is the point of the Kubernetes RBAC update verb?
For that matter, does anyone have a more detailed list of all RBAC verbs?
Following up this, I went to the Kubernetes REST API documentation, which has a long list of all the HTTP API calls you can make to the REST server.
I thought this would help because the one (1) table available describing what the different verbs can do did so by comparing them to HTTP verbs. So the plan was:
See what HTTP verb the update permission is equated to.
Go to the reference and find an example of using that HTTP verb on a deployment.
Test the kubectl equivalent.
So.
What HTTP verb equals the update permission?
PUT.
Example of using PUT for deployments?
Replace Scale: replace scale of the specified Deployment
HTTP Request
PUT /apis/apps/v1/namespaces/{namespace}/deployments/{name}/scale
What's the equivalent kubectl command?
Well we're scaling a deployment, so I'm going to say:
kubectl scale deployment hello-node --replicas=2
Can I run this command?
I extended my permissions to deployment/scale first, and then ran it.
Error from server (Forbidden): deployments.apps "hello-node" is forbidden: User "user" cannot patch resource "deployments/scale" in API group "apps" in the namespace "default"
Well. That also needs patch permissions, it would appear.
Despite the fact that the HTTP verb used is PUT according to the API docs, and PUT is equivalent to update according to the one (1) source of any information on these RBAC verbs.
Anyway.
My Conclusion: It appears that update is indeed pretty useless, at least for Deployments.
The RBAC setup seemed promising at first, but honestly it's starting to lose its lustre as I discover more and more edge cases and undocumented mysteries. Access permissions seem like the absolute worst thing to be vague about, or your security ends up being more through obscurity than certainty.
You can get a dump of the "allowed/supported" verbs using this krew plugin rbac-tool
# Generate a ClusterRole with all the available permissions for core and apps api groups
$ kubectl rbac-tool show --for-groups=,apps
While it won't tell you exactly the semantics of each verb - it will give yu a sense about the RBAC permissions universe your cluster have.

Fetching Stackdriver Monitoring TimeSeries data for a pod running on a k8s cluster on GKE using the REST API

My objective is to fetch the time series of a metric for a pod running on a kubernetes cluster on GKE using the Stackdriver TimeSeries REST API.
I have ensured that Stackdriver monitoring and logging are enabled on the kubernetes cluster.
Currently, I am able to fetch the time series of all the resources available in a cluster using the following filter:
metric.type="container.googleapis.com/container/cpu/usage_time" AND resource.labels.cluster_name="<MY_CLUSTER_NAME>"
In order to fetch the time series of a given pod id, I am using the following filter:
metric.type="container.googleapis.com/container/cpu/usage_time" AND resource.labels.cluster_name="<MY_CLUSTER_NAME>" AND resource.labels.pod_id="<POD_ID>"
This filter returns an HTTP 200 OK with an empty response body. I have found the pod ID from the metadata.uid field received in the response of the following kubectl command:
kubectl get deploy -n default <SERVICE_NAME> -o yaml
However, when I use the Pod ID of a background container spawned by GKE/Stackdriver, I do get the time series values.
Since I am able to see Stackdriver metrics of my pod on the GKE UI, I believe I should also get the metric values using the REST API.
My doubts/questions are:
Am I fetching the Pod ID of my pod correctly using kubectl?
Could there be some issue with my cluster setup/service deployment due to which I'm unable to fetch the metrics?
Is there some other way in which I can get the time series of my pod using the REST APIs?
I wouldn't rely on kubectl get deploy for pod ids. I would get them with something like kubectl -n default get pods | grep <prefix-for-your-pod> | awk '{print $1}'
I don't think so, but the best way to find out is opening a support ticket with GCP if you have any doubts.
Not that I'm aware of, Stackdriver is the monitoring solution in GCP. Again, you can check with GCP support. There are other tools that you can use to get metrics from Kubernetes like Prometheus. There are multiple guides on the web on how to set it up with Grafana on k8s. This is one for example.
Hope it helps!
Am I fetching the Pod ID of my pod correctly using kubectl?
You could use JSONpath as output with kubectl, in this case iterating over the Pods and fetching the metadata.name and metadata.uid fields:
kubectl get pods -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.uid}{"\n"}{end}'
which will output something like this:
nginx-65899c769f-2j775 d4fr5t6-bc2f-11e8-81e8-42010a84011f
nginx2-77b5c9d48c-7qlps 4f5gh6r-bc37-11e8-81e8-42010a84011f
Could there be some issue with my cluster setup/service deployment due to which I'm unable to fetch the metrics?
As #Rico mentioned in his answer, contacting the GCP support could be a way forward if you don't get further with the troubleshooting, see below.
Is there some other way in which I can get the time series of my pod using the REST APIs?
You could use the APIs Explorer or the Metrics Explorer from within the Stackdriver portal. There's some good troubleshooting tips here with a link to the APIs Explorer. In the Stackdriver Metrics Explorer it's fairly easy to reassemble the filter you've used using dropdown lists to choose e.g. a particular pod_id.
Taken from the Troubleshooting the Monitoring guide (linked above) regarding an empty HTTP 200 response on filtered queries:
If your API call returns status code 200 and an empty response, there
are several possibilities:
If your call uses a filter, then the filter might not have matched anything. The filter match is case-sensitive. To resolve filter
problems, start by specifying only one filter component, such as
metric.type, and see if you get results. Add the other filter
components one-by-one.
If you are working with a custom metric, you might not have specified the project where your custom metric is defined.*
I found this link when reading through the documentation of the Monitoring API. That link will get you to the APIs Explorer with some pre-filled fields, change these accordingly and add your own filter.
I have not tested more using the REST API at the moment but hopefully this could get you forward.