Running a GKE cluster with 1.8.1 - when I look at /logs/kube-apiserver-audit.log, it's completely empty. I've taken actions like creating deployments and deleting pods that have been visible in audit logs for clusters I've provisioned outside of GKE.
Is there a better way to view or access these kinds of events with GKE?
That would be because Container Engine 1.8 release does not enable the audit logging feature yet. From Release Notes:
KNOWN ISSUE: Audit Logging, a beta feature in Kubernetes 1.8, is currently not enabled on Container Engine.
It will probably be enabled at some point in the future, I’d keep an eye on the Release Notes.
Related
I have recently updated DataDog to use a Cluster Agent. I am currently trying to set up the Kubernetes integration. This should be an auto discovered through the auto_conf.yaml. But for some reason when updating to the Cluster Agent we lost metrics through the kubernetes integration. My guess was to set it as a cluster check by adding cluster_check:true in the auto_conf.yaml file, but that did not work. I currently have it set up only on the node agents and configured just like it says in this documentation. Is there something else that needs to be done to set up the Kubernetes integration with a Cluster Agent?
Solved the issue by adding kubernetes-stat-core via the following manifests. This is uses the kube-state-metrics v2.0.
https://github.com/DataDog/datadog-agent/tree/main/Dockerfiles/manifests/kubernetes_state_core
I am a junior developer currently running a service in a Kubernetes environment.
How can I check if a resource inside Kubernetes has been deleted for some reason?
As a simple example, if a deployment is deleted, I want to know which user deleted it.
Could you please tell me which log to look at.
And I would like to know how to collect these logs.
I don't have much experience yet, so I'm asking for help.
Also, if you have a reference or link, please share it. It will be very helpful to me.
Thank you:)
Start with enabling audit with lots of online resources about doing this.
If you are on AWS and using EKS I would suggest enabling "Amazon EKS control plane logging" By enabling it you can enable audit and diagnostic logs streaming in AWS cloudwatch logs, which are more easily accessible, and useful for audit and compliance requirements. Control plane logs make it easy for you to secure and run your clusters and make the entire system more audiatable.
As per AWS documentation:
Kubernetes API server component logs (api) – Your cluster's API server is the control plane component that exposes the Kubernetes API. For more information, see kube-apiserver in the Kubernetes documentation.
Audit (audit) – Kubernetes audit logs provide a record of the individual users, administrators, or system components that have affected your cluster. For more information, see Auditing in the Kubernetes documentation.
Authenticator (authenticator) – Authenticator logs are unique to Amazon EKS. These logs represent the control plane component that Amazon EKS uses for Kubernetes Role-Based Access Control (RBAC) authentication using IAM credentials. For more information, see Cluster authentication.
Controller manager (controllerManager) – The controller manager manages the core control loops that are shipped with Kubernetes. For more information, see kube-controller-manager in the Kubernetes documentation.
Scheduler (scheduler) – The scheduler component manages when and where to run pods in your cluster. For more information, see kube-scheduler in the Kubernetes documentation.
Reference: https://docs.aws.amazon.com/eks/latest/userguide/control-plane-logs.html
We have been running a cluster on GKE for around three years. As such, legacy authorization is enabled.
The control plane has been getting updated automatically, and our node pools are running a mixture of 1.12 and 1.14.
We have an increasing number of services, and are planning on incrementally adopting istio.
We want to enable a minimal RBAC setup without causing errors and downtime of our services.
I haven't been able to find any guides for how to accomplish this. Some people say just to enable RBAC authorization on the GKE cluster, but I assume that would take down all of our services.
It has also been implied that k8s can run in a hybrid ABAC/RBAC mode, but we can't tell if it is or not!
Is there a good guide for migrating to RBAC for GKE?
If you cluster is Regional you won't have downtime in your application when upgrade, but if your cluster is single-zonal or multi-zonal the best approach here is:
Add a new node pool
Cordon the old node pool to migrate the applications to the new node pool
Delete the old node pool after all pods are migrated.
It is the safesty way to update your node pool (zonal) without downtimes. Please read the references below to understand in details every step.
References:
https://kubernetes.io/docs/concepts/architecture/nodes/#reliability
https://kubernetes.io/docs/reference/kubectl/cheatsheet/#interacting-with-nodes-and-cluster
I have a Kubernetes cluster on GCP that hosts a Flask application and some more services.
Before upgrading the master node to version 1.15 (was 1.14.x) I saw every log from the flask application on Stackdriver's GKE Container logs, now I don't get any log.
Searching through the release notes I noticed that from 1.15 they:
disabled stackdriver logging agent to prevent node startup failures
I'm not entirely sure that's the reason but I'm sure that the logging stopped after upgrading the master and node versions to 1.15, there has been no code change in the application core.
My question is how can I reactivate the logs I saw before?
I actually found the solution, as stated by the release notes, the stackdriver agent actually becomes disabled by default in 1.15.
To activate it again you need to edit the cluster following these instructions, setting "System and workload logging and monitoring" under "Stackdriver Kubernetes Engine Monitoring"
After that, I could not use anymore the legacy Stackdriver Monitoring, so I found my logs weren't under the resource "GKE Container" but under "Kubernetes Container".
I also had to update every log-based metric that had a filter on resource.type="container", changing it to resource.type="k8s_container"
We are unable to grab logs from our GKE cluster running containers if StackDriver is disabled on GCP. I understand that it is proxying stderr/stdout but it seems rather heavy handed to block these outputs when Stackdriver is disabled.
How does one get an ELF stack going on GKE without being billed for StackDriver aka disabling it entirely? or is it so much a part of GKE that this is not doable?
From the article linked on a similar question regarding GCP:
"Kubernetes doesn’t specify a logging agent, but two optional logging agents are packaged with the Kubernetes release: Stackdriver Logging for use with Google Cloud Platform, and Elasticsearch. You can find more information and instructions in the dedicated documents. Both use fluentd with custom configuration as an agent on the node." (https://kubernetes.io/docs/concepts/cluster-administration/logging/#exposing-logs-directly-from-the-application)
Perhaps our understanding of Stackdriver billing is wrong?
But we don't want to be billed for Stackdriver as the 150MB of logs outside of the GCP metrics is not going to be enough and we have some expertise in setting up ELF for logging that we'd like to use.
You can disable Stackdriver logging/monitoring on Kubernetes by editing your cluster, and setting "Stackdriver Logging" and "Stackdriver Monitoring" to disable.
I would still suggest sticking to GCP over AWS as you get the whole Kube as a service experience. Amazon's solution is still a little way off, and they are planning charging for the service in addition to the EC2 node prices (Last I heard).