I want to enable Stackdriver logging with my Kubernetes cluster on GKE.
It's stated here: https://kubernetes.io/docs/user-guide/logging/stackdriver/
This article assumes that you have created a Kubernetes cluster with cluster-level logging support for sending logs to Stackdriver Logging. You can do this either by selecting the Enable Stackdriver Logging checkbox in the create cluster dialogue in GKE, or by setting the KUBE_LOGGING_DESTINATION flag to gcp when manually starting a cluster using kube-up.sh.
But my cluster was created without this option enabled.
How do I change the environment variable while my cluster is running?
Unfortunately, logging isn't a setting that can be enabled/disabled on a cluster once it is running. This is something that we hope to change in the near future, but in the mean time your best bet is to delete and recreate your cluster (sorry!).
Related
I have a Kubernetes cluster on GCP that hosts a Flask application and some more services.
Before upgrading the master node to version 1.15 (was 1.14.x) I saw every log from the flask application on Stackdriver's GKE Container logs, now I don't get any log.
Searching through the release notes I noticed that from 1.15 they:
disabled stackdriver logging agent to prevent node startup failures
I'm not entirely sure that's the reason but I'm sure that the logging stopped after upgrading the master and node versions to 1.15, there has been no code change in the application core.
My question is how can I reactivate the logs I saw before?
I actually found the solution, as stated by the release notes, the stackdriver agent actually becomes disabled by default in 1.15.
To activate it again you need to edit the cluster following these instructions, setting "System and workload logging and monitoring" under "Stackdriver Kubernetes Engine Monitoring"
After that, I could not use anymore the legacy Stackdriver Monitoring, so I found my logs weren't under the resource "GKE Container" but under "Kubernetes Container".
I also had to update every log-based metric that had a filter on resource.type="container", changing it to resource.type="k8s_container"
We have a deployment of Kubernetes in Google Cloud Platform. Recently we hit one of the well known issues related on a problem with the kube-dns that happens at high amount of requests https://github.com/kubernetes/kubernetes/issues/56903 (its more related to SNAT/DNAT and contract but the final result is out of service of kube-dns).
After a few days of digging on that topic we found that k8s already have a solution witch is currently in alpha (https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/)
The solution is to create a caching CoreDNS as a daemonset on each k8s node so far so good.
Problem is that after you create the daemonset you have to tell to kubelet to use it with --cluster-dns option and we cant find any way to do that in GKE environment. Google bootstraps the cluster with "configure-sh" script in instance metadata. There is an option to edit the instance template and "hardcode" the required values but that is not an option if you upgrade the cluster or use the horizontal autoscaling all of the modified values will be lost.
The last idea was to use custom startup script that pull configuration and update the metadata server but this is a too complicated task.
As of 2019/12/10, GKE now supports through the gcloud CLI in beta:
Kubernetes Engine
Promoted NodeLocalDNS Addon to beta. Use --addons=NodeLocalDNS with gcloud beta container clusters create. This addon can be enabled or disabled on existing clusters using --update-addons=NodeLocalDNS=ENABLED or --update-addons=NodeLocalDNS=DISABLED with gcloud container clusters update.
See https://cloud.google.com/sdk/docs/release-notes#27300_2019-12-10
You can spin up another kube-dns deployment e.g. in different node-pool and thus having 2x nameserver in the pod's resolv.conf.
This would mitigate the evictions and other failures and generally allow you to completely control your kube-dns service in the whole cluster.
In addition to what was mentioned in this answer - With beta support on GKE, the nodelocal caches now listen on the kube-dns service IP, so there is no need for a kubelet flag change.
I want to enable Kubernetes Engine Monitoring on clusters but I don't see that as a field in Terraform's google_container_cluster resource.
Is Kubernetes Engine Monitoring managed with another resource?
You can use the newer Kubernetes Monitoring Service by setting monitoring_service to monitoring.googleapis.com/kubernetes instead of the default monitoring.googleapis.com.
When enabling this you will also need to set logging_service to logging.googleapis.com/kubernetes as well.
I have a namespace in k8s with the setting:
scheduler.alpha.kubernetes.io/defaultTolerations:
'[{"key": "role_va", "operator": "Exists"}]'
If I am not mistaken all pods that are created in this namespace must get this toleration.
But the pods don't get it.
I read this and understood that I must enable the PodTolerationRestriction controller.
How can I do this on gloud?
In order to enable PodTolerationRestriction you might be required to set --enable-admission-plugins flag in kube-apiserver configuration. This is according to the official documentation, as by default this plugin is not included in admission controller plugins list.
However, in GKE there is no possibility to adapt any specific flag for the current API server run-time configuration, because Kubernetes cluster engine core components are not exposed to any user purpose actions (related Stackoverflow thread).
Assuming that, you can consider using GCE and bootstrap cluster with any cluster building solutions, depending on your preference, within a particular GCE VM.
We are unable to grab logs from our GKE cluster running containers if StackDriver is disabled on GCP. I understand that it is proxying stderr/stdout but it seems rather heavy handed to block these outputs when Stackdriver is disabled.
How does one get an ELF stack going on GKE without being billed for StackDriver aka disabling it entirely? or is it so much a part of GKE that this is not doable?
From the article linked on a similar question regarding GCP:
"Kubernetes doesn’t specify a logging agent, but two optional logging agents are packaged with the Kubernetes release: Stackdriver Logging for use with Google Cloud Platform, and Elasticsearch. You can find more information and instructions in the dedicated documents. Both use fluentd with custom configuration as an agent on the node." (https://kubernetes.io/docs/concepts/cluster-administration/logging/#exposing-logs-directly-from-the-application)
Perhaps our understanding of Stackdriver billing is wrong?
But we don't want to be billed for Stackdriver as the 150MB of logs outside of the GCP metrics is not going to be enough and we have some expertise in setting up ELF for logging that we'd like to use.
You can disable Stackdriver logging/monitoring on Kubernetes by editing your cluster, and setting "Stackdriver Logging" and "Stackdriver Monitoring" to disable.
I would still suggest sticking to GCP over AWS as you get the whole Kube as a service experience. Amazon's solution is still a little way off, and they are planning charging for the service in addition to the EC2 node prices (Last I heard).