I have a micro service scaled out across several pods in a Google Cloud Kubernetes Engine. Being in a multi-cloud-shop, we have our logging/monitoring/telemetry in Azure Application Insights.
Our data should be kept inside Europe, so our GCP Kubernetes cluster is set up with
Master zone: europe-west1-b
Node zones: europe-west1-b
When I create a node pool on this cluster, the nodes apparently has the zone europe-west1-b (as expected), seen from the Google Cloud Platform Console "Node details".
However, in Azure Application Insights, from the telemetry reported from the applications running in pods in this node pool, the client_City is reported as "Mountain View" and client_StateOrProvince is "California", and some cases "Ann Arbor" in "Michigan".
At first I waived this strange location as just some inter-cloud-issue (e.g. defaulting to something strange when not filling out the information as expected on the receiving end, or similar).
But now, Application Insights actually pointed out that there is a quite significant performance difference depending on if my pod is running in Michigan or in California, which lead me to belive that these fields are actually correct.
Is GCP fooling me? Am I looking at the wrong place? How can I make sure my GCP Kubernetes nodes are running in Europe?
This is essential for me to know, both from a GCPR perspective, and of course performance (latency) wise.
the Azure Application Insights are fooling you, because the external IP was registered by Google in California, not considering that these are used by data-centers distributed all over the globe. also have a GCE instance deployed to Frankfurt am Main, while the IP appears as if it would be Mountain View. StackDriver might report the actual locations (and not some vague GeoIP locations).
Related
I try to deploy a set of k8s on the cloud, there are two options:the masters are in trust to the cloud provider or maintained by myself.
so i wonder about that if the masters in trust will leak the data on workers?
Shortly, will the master know the data on workers/nodes?
The abstractions in Kubernetes are very well defined with clear boundaries. You have to understand the concept of Volumes first. As defined here,
A Kubernetes volume is essentially a directory accessible to all
containers running in a pod. In contrast to the container-local
filesystem, the data in volumes is preserved across container
restarts.
Volumes are attached to the containers in a pod and There are several types of volumes
You can see the layers of abstraction source
Master to Cluster communication
There are two primary communication paths from the master (apiserver) to the cluster. The first is from the apiserver to the kubelet process which runs on each node in the cluster. The second is from the apiserver to any node, pod, or service through the apiserver’s proxy functionality.
Also, you should check the CCM - The cloud controller manager (CCM) concept (not to be confused with the binary) was originally created to allow cloud specific vendor code and the Kubernetes core to evolve independent of one another. The cloud controller manager runs alongside other master components such as the Kubernetes controller manager, the API server, and scheduler. It can also be started as a Kubernetes addon, in which case it runs on top of Kubernetes.
Hope this answers all your questions related to Master accessing the data on Workers.
If you are still looking for more secure ways, check 11 Ways (Not) to Get Hacked
Short answer: yes the control plane can access all of your data.
Longer and more realistic answer: probably don't worry about it. It is far more likely that any successful attack against the control plane would be just as successful as if you were running it yourself. The exact internal details of GKE/AKS/EKS are a bit fuzzy, but all three providers have a lot of experience running multi-tenant systems and it wouldn't be negligent to trust that they have enough protections in place against lateral escalations between tenants on the control plane.
I have a GKE cluster running in us-central1 with a preemptable node pool. I have nodes in each zone (us-central1-b,us-central1-c,us-central1-f). For the last 10 hours, I get the following error for the underlying node vm:
Instance '[instance-name]' creation failed: The zone
'[instance-zone]'
does not have enough resources available to fulfill
the request. Try a different zone, or try again
later.
I tried creating new clusters in different regions with different machine types, using HA (multi-zone) settings and I get the same error for every cluster.
I saw an issue on Google Cloud Status Dashboard and tried with the console, as recommended, and it errors out with a timeout error.
Is anyone else having this problem? Any idea what I may be dong wrong?
UPDATES
Nov 11
I stood up a cluster in us-west2, this was the only one which would work. I used gcloud command line, it seems the UI was not effective. There was a note similar to this situation, use gcloud not ui, on the Google Cloud Status Dashboard.
I tried creating node pools in us-central1 with the gcloud command line, and ui, to no avail.
I'm now federating deployments across regions and standing up multi-region ingress.
Nov. 12
Cannot create HA clusters in us-central1; same message as listed above.
Reached out via twitter and received a response.
Working with the K8s guide to federation to see if I can get multi-cluster running. Most likely going to use Kelsey Hightowers approach
Only problem, can't spin up clusters to federate.
Findings
Talked with google support, need a $150/mo. package to get a tech person to answer my questions.
Preemptible instances are not a good option for a primary node pool. I did this because I'm cheap, it bit me hard.
The new architecture is a primary node pool with committed use VMs that do not autoscale, and a secondary node pool with preemptible instances for autoscale needs. The secondary pool will have minimum nodes = 0 and max nodes = 5 (for right now); this cluster is regional so instances are across all zones.
Cost for an n1-standard-1 sustained use (assuming 24/7) a 30% discount off list.
Cost for a 1-year n1-standard-1 committed use is about ~37% discount off list.
Preemptible instances are re-provisioned every 24hrs., if they are not taken from you when resource needs spike in the region.
I believe I fell prey to a resource spike in the us-central1.
A must-watch for people looking to federate K8s: Kelsey Hightower - CNCF Keynote | Kubernetes Federation
Issue appears to be resolved as of Nov 13th.
We are unable to grab logs from our GKE cluster running containers if StackDriver is disabled on GCP. I understand that it is proxying stderr/stdout but it seems rather heavy handed to block these outputs when Stackdriver is disabled.
How does one get an ELF stack going on GKE without being billed for StackDriver aka disabling it entirely? or is it so much a part of GKE that this is not doable?
From the article linked on a similar question regarding GCP:
"Kubernetes doesn’t specify a logging agent, but two optional logging agents are packaged with the Kubernetes release: Stackdriver Logging for use with Google Cloud Platform, and Elasticsearch. You can find more information and instructions in the dedicated documents. Both use fluentd with custom configuration as an agent on the node." (https://kubernetes.io/docs/concepts/cluster-administration/logging/#exposing-logs-directly-from-the-application)
Perhaps our understanding of Stackdriver billing is wrong?
But we don't want to be billed for Stackdriver as the 150MB of logs outside of the GCP metrics is not going to be enough and we have some expertise in setting up ELF for logging that we'd like to use.
You can disable Stackdriver logging/monitoring on Kubernetes by editing your cluster, and setting "Stackdriver Logging" and "Stackdriver Monitoring" to disable.
I would still suggest sticking to GCP over AWS as you get the whole Kube as a service experience. Amazon's solution is still a little way off, and they are planning charging for the service in addition to the EC2 node prices (Last I heard).
We have some Kubernetes clusters that have been deployed using kops in AWS.
We really like using the upstream/official images.
We have been wondering whether or not there was a good way to monitor the systems without installing software directly on the hosts? Are there docker containers that can extract the information from the host? I think that we are likely concerned with:
Disk space (this seems to be passed through to docker via df
Host CPU utilization
Host memory utilization
Is this host/node level information already available through heapster?
Not really a question about kops, but a question about operating Kubernetes. kops stops at the point of having a functional k8s cluster. You have networking, DNS, and nodes have joined the cluster. From there your world is your oyster.
There are many different options for monitoring with k8s. If you are a small team I usually recommend offloading monitoring and logging to a provider.
If you are a larger team or have more specific needs then you can look at such options as Prometheus and others. Poke around in the https://github.com/kubernetes/charts repository, as I know there is a Prometheus chart there.
As with any deployment of any form of infrastructure you are going to need Logging, Monitoring, and Metrics. Also, do not forget to monitor the monitoring ;)
I am using https://prometheus.io/, it goes naturally with kubernetes.
Kubernetes api already exposes a bunch of metrics in prometheus format,
https://github.com/kubernetes/ingress-nginx also exposes prometheus metrics (enable-vts-status: "true"), and you can also install https://github.com/prometheus/node_exporter as a daemonset to monitor CPU, disk, etc...
I install one prometheus inside the cluster to monitor internal metrics and one outside the cluster to monitor LBs and URLs.
Both send alerts to the same https://github.com/prometheus/alertmanager that MUST be outside the cluster.
It took me about a week to configure everything properly.
It was worth it.
The Kubernetes HA documentation shows that you can ensure availability in the case of the failure of an apiserver by having multiple instances behind a load balancer.
However, it doesn't cover what happens if the Kubernetes is deployed across multiple availability zones. There is some documentation here but it doesn't really go into failure scenarios.
What is best practice here? Should you pin the api-servers to instances inside each AZ? What happens in the event of a split brain? If I have a pod running in one AZ and it becomes unavailable to the rest of the world, what happens to it?
I specifically want to know about a custom on-premise installation, not AWS or GCE.