How can I find GKE's control plane logs? - kubernetes

So there's this page about auditing-logs and I'm very confused about:
The k8s.io service is used for Kubernetes audit logs. These logs are generated by the Kubernetes API Server component and they contain information about actions performed using the Kubernetes API. For example, any changes you make on a Kubernetes resource by using the kubectl command are recorded by the k8s.io service. For more information, see Auditing in the Kubernetes documentation.
The container.googleapis.com service is used for GKE control plane audit logs. These logs are generated by the GKE internal components and they contain information about actions performed using the GKE API. For example, any changes you perform on a GKE cluster configuration using a gcloud command are recorded by the container.googleapis.com service.
which one shall I pick to get:
/var/log/kube-apiserver.log - API Server, responsible for serving the API
/var/log/kube-controller-manager.log - Controller that manages replication controllers
or these are all similar to EKS where audit logs means a separate thing?
Audit (audit) – Kubernetes audit logs provide a record of the individual users, administrators, or system components that have affected your cluster. For more information, see Auditing in the Kubernetes documentation.

If the cluster still exists, you should be able to do the following on GKE
kubectl proxy
curl http://localhost:8001/logs/kube-apiserver.log
AFAIK, there's no way to get server logs for clusters that have been deleted.

Logs for GKE control-plane components are available since November 29, 2022 for clusters with versions 1.22.0 and later.
You simply need to activate it on the clusters. Either
via CLI:
gcloud container clusters update [CLUSTER_NAME] \
--region=[REGION] \
--monitoring=SYSTEM,WORKLOAD,API_SERVER,SCHEDULER,CONTROLLER_MANAGER
via GCP web-console: Open the cluster-details, in the section "Features" edit the entry "Cloud Logging" and add the "Control Plane" components.
See documentation for details.
Note the notes in the solutions documentation, especially about reaching the logging.googleapis.com/write_requests quota (quick link).

You cannot. GKE does not make them available. Audit logs are different, those are a record of API actions.

Related

Listing pod exec sessions in Kubernetes

I've got a cassandra pod in which I can see several cqlsh sessions running. Since this is a shared dev kubernetes cluster, I figure it must be some devs running kubectl exec into the pod to run some queries.
My question is then if there is a way to list and get any information on all the execs for a given pod.
As mentioned in the link shared by #Marian Theisen
You would want to set up API auditing on the pod/exec endpoint.
Kubernetes auditing provides a security-relevant, chronological set of records documenting the sequence of actions in a cluster. The cluster audits the activities generated by users, by applications that use the Kubernetes API, and by the control plane itself.
Also refer to this link for more information.

To read secret from etcd in AKS using etcdctl throws Error: open /etc/kubernetes/pki/apiserver-etcd-client.crt: no such file or directory

To read secret from etcd in AKS Cluster, Used below command
ETCDCTL_API=3 etcdctl --endpoints=<endpoint> --cert=/etc/kubernetes/pki/apiserver-etcd-client.crt --key=/etc/kubernetes/pki/apiserver-etcd-client.key get / --prefix --keys-only
Error: open /etc/kubernetes/pki/apiserver-etcd-client.crt: no such file or directory.
Where the certificates will be stored by default?
refered the doc https://docs.starlingx.io/security/kubernetes/etcd-certificates-c1fc943e4a9c.html for certification path.
It seems to me that you're having the wrong image about AKS (and Managed Kubernetes solutions in general).
Basically:
Managed Kubernetes solutions (like AKS, GKE EKS) are having some of the cluster components abstracted from the user (meaning you won't be able to access them).
Kubernetes clusters that are not managed by a cloud provider (like on-premise) are giving the user access to pretty much everything.
Above bullet points were only to narrow down the issue. There are a lot of differences between cloud-managed and self-managed solutions and I encourage you to check them out.
Example reference:
Serverfault.com: What is the point of running self managed Kubernetes cluster
In short terms:
You will not get the access to etcd on AKS.
You won't find the etcd certificates on your VM or Azure Cloud Shell
Citing official Microsoft documentation:
Control plane
When you create an AKS cluster, a control plane is automatically created and configured. This control plane is provided at no cost as a managed Azure resource abstracted from the user. You only pay for the nodes attached to the AKS cluster. The control plane and its resources reside only on the region where you created the cluster.
The control plane includes the following core Kubernetes components:
Component
Description >
kube-apiserver
The API server is how the underlying Kubernetes APIs are exposed. This component provides the interaction for management tools, such as kubectl or the Kubernetes dashboard.
etcd
To maintain the state of your Kubernetes cluster and configuration, the highly available etcd is a key value store within Kubernetes.
kube-scheduler
When you create or scale applications, the Scheduler determines what nodes can run the workload and starts them.
kube-controller-manager
The Controller Manager oversees a number of smaller Controllers that perform actions such as replicating pods and handling node operations.
AKS provides a single-tenant control plane, with a dedicated API server, scheduler, etc. You define the number and size of the nodes, and the Azure platform configures the secure communication between the control plane and nodes. Interaction with the control plane occurs through Kubernetes APIs, such as kubectl or the Kubernetes dashboard.
While you don't need to configure components (like a highly available etcd store) with this managed control plane, you can't access the control plane directly. Kubernetes control plane and node upgrades are orchestrated through the Azure CLI or Azure portal. To troubleshoot possible issues, you can review the control plane logs through Azure Monitor logs.
To configure or directly access a control plane, deploy a self-managed Kubernetes cluster using Cluster API Provider Azure.
-- Docs.microsoft.com: Azure: AKS: Concepts clusters workloads: Control plane

How can I check if a resource inside Kubernetes has been deleted for some reason?

I am a junior developer currently running a service in a Kubernetes environment.
How can I check if a resource inside Kubernetes has been deleted for some reason?
As a simple example, if a deployment is deleted, I want to know which user deleted it.
Could you please tell me which log to look at.
And I would like to know how to collect these logs.
I don't have much experience yet, so I'm asking for help.
Also, if you have a reference or link, please share it. It will be very helpful to me.
Thank you:)
Start with enabling audit with lots of online resources about doing this.
If you are on AWS and using EKS I would suggest enabling "Amazon EKS control plane logging" By enabling it you can enable audit and diagnostic logs streaming in AWS cloudwatch logs, which are more easily accessible, and useful for audit and compliance requirements. Control plane logs make it easy for you to secure and run your clusters and make the entire system more audiatable.
As per AWS documentation:
Kubernetes API server component logs (api) – Your cluster's API server is the control plane component that exposes the Kubernetes API. For more information, see kube-apiserver in the Kubernetes documentation.
Audit (audit) – Kubernetes audit logs provide a record of the individual users, administrators, or system components that have affected your cluster. For more information, see Auditing in the Kubernetes documentation.
Authenticator (authenticator) – Authenticator logs are unique to Amazon EKS. These logs represent the control plane component that Amazon EKS uses for Kubernetes Role-Based Access Control (RBAC) authentication using IAM credentials. For more information, see Cluster authentication.
Controller manager (controllerManager) – The controller manager manages the core control loops that are shipped with Kubernetes. For more information, see kube-controller-manager in the Kubernetes documentation.
Scheduler (scheduler) – The scheduler component manages when and where to run pods in your cluster. For more information, see kube-scheduler in the Kubernetes documentation.
Reference: https://docs.aws.amazon.com/eks/latest/userguide/control-plane-logs.html

Which CloudWatch log contains EKS' Kubernetes events?

I had a few pods restarting in my EKS cluster. I could see that they were SIGKILL'ed by K8s. Now I would like to know the reason but I can't because the Kubernetes events TTL is only one hour.
I am checking the control plane logs for the EKS cluster in CloudWatch now but don't know which of them contains these messages as well.
Which of the logs does contain these events form K8s?
Yes you are right, the default value of --event-ttl is 60m00s, and unfortunately, there is currently no any native option to change that value in EKS. The github issue is still opened without any promising timeframes.
As per guide you sent and as per Streaming EKS Metrics and Logs to CloudWatch, if you configured everything correctly, you can find logs under “Container Insights” from the drop-down menu.
Logs you might want to check are
Control plane logs consist of scheduler logs, API server logs, and
audit logs.
Data plane logs consist of kubelet and container runtime
engine logs.
Can you please specify what exact logs you have in your cloudwatch control plane logs and what you already checked? Maybe that will help

Flink native kubernetes deployment

I have some limitations with the rights required by Flink native deployment.
The prerequisites say
KubeConfig, which has access to list, create, delete pods and **services**, configurable
Specifically, my issue is I cannot have a service account with the rights to create/remove services. create/remove pods is not an issue. but services by policy only can be created within an internal tool.
could it be any workaround for this?
Flink creates two service in native Kubernetes integration.
Internal service, which is used for internal communication between JobManager and TaskManager. It is only created when the HA is not enabled. Since the HA service will be used for the leader retrieval when HA enabled.
Rest service, which is used for accessing the webUI or rest endpoint. If you have other ways to expose the rest endpoint, or you are using the application mode, then it is also optional. However, it is always be created currently. I think you need to change some codes to work around.