Google Cloud Kubernetes Engine troubleshooting - kubernetes

is anyone experiencing issue with the Google Kubernetes Engine (specifically in us-central1-b region - April 3rd 11pm EST)
I'm not able to to see my workload or any of my cluster configurations in the Google Kubernetes Engine section, it is intermittent (one minute is there then disappears)
Also can't connect to the Kubernetes section of the Google Cloud Console to check on my pods !!

No information of any issues as far as I can see. Could have been a network connection issue on your side, please check.
It appears to me that this is some issue about getting metadata from the cluster. You could check the following to help you troubleshoot or see if there is any underlying problems:
$ docker info
$ docker version
$ kubectl version
$ gcloud container clusters describe <your-cluster-name-) --zone <your-cluster-zone>
$ kubectl get componentstatuses

Related

Problems in connection when I try to connect GKE cluster using kubectl

I have a running cluster on Google Cloud Kubernetes engine and I want to access that using kubectl from my local system.
I tried installing kubectl with gcloud but it didn't worked. Then I installed kubectl using apt-get. When I try to see the version of it using kubectl version it says
Unable to connect to server EOF. I also don't have file ~/.kube/config, which I am not sure why. Can someone please tell me what I am missing here? How can I connect to the already running cluster in GKE?
gcloud container clusters get-credentials ... will auth you against the cluster using your gcloud credentials.
If successful, the command adds appropriate configuration to ~/.kube/config such that you can kubectl.

How to install Kubernetes v1.10.11 on a GCP cluster?

There was recently a Kubernetes security hole that was patched in v1.10.11 (among other versions), so I would like to upgrade to that version. I am currently on v1.10.9. However, when running the command gcloud container get-server-config to get the list of valid node versions, v1.10.11 doesn't show up. Instead, it jumps straight from v1.10.9 to v1.11.2.
Does anyone have any idea why I cannot seem to use the usual gcloud container clusters upgrade [CLUSTER_NAME] --cluster-version [CLUSTER_VERSION] to upgrade to this version?
Thanks in advance!
Based on:
https://cloud.google.com/kubernetes-engine/docs/security-bulletins#december-3-2018
If you have Kubernetes in v1.10.9 you should (to patch this security hole) update your GKE Cluster to 1.10.9-gke.5.
The following Kubernetes versions are now available for new clusters and for opt-in master upgrades for existing clusters:
1.9.7-gke.11,
1.10.6-gke.11,
1.10.7-gke.11,
1.10.9-gke.5,
1.11.2-gke.18
Please validate your Scheduled master auto-upgrades option in GKE.
If it's enabled your cluster masters were auto-upgraded by Google and the next possible version to update is further version so v1.11.2, what is showing by GKE for you.

TLS handshake timeout with kubernetes in GKE

I've created a cluster on Google Kubernetes Engine (previously Google Container Engine) and installed the Google Cloud SDK and the Kubernetes tools with it on my Windows machine.
It worked well for some time, and, out of nowhere, it stopped working. Every command I'm issuing with kubectl provokes the following:
Unable to connect to the server: net/http: TLS handshake timeout
I've searched Google, the Kubernetes Github Issues, Stack Overflow, Server Fault ... without success.
I've tried the following:
Restart my computer
Change wifi connection
Check that I'm not somehow using a proxy
Delete and re-create my cluster
Uninstall the Google Cloud SDK (and kubectl) from my machine and re-install them
Delete my .kube folder (config and cache)
Check my .kube/config
Change my cluster's version (tried 1.8.3-gke.0 and 1.7.8-gke.0)
Retry several hours later
Tried both on PowerShell and cmd.exe
Note that the cluster seem to work perfectly, since I have my application running on it and can interact with it normally through the Google Cloud Shell.
Running:
gcloud container clusters get-credentials cluster-2 --zone europe-west1-b --project ___
kubectl get pods
works on Google Cloud Shell and provokes the TLS handshake timeout on my machine.
For others seeing this issue, there is another cause to consider.
After doing:
gcloud config set project $PROJECT_NAME
gcloud config set container/cluster $CLUSTER_NAME
gcloud config set compute/zone europe-west2
gcloud beta container clusters get-credentials $CLUSTER_NAME --region europe-west2 --project $PROJECT_NAME
I was then seeing:
kubectl cluster-info
Unable to connect to the server: net/http: TLS handshake timeout
I tried everything suggested here and elsewhere. When the above worked without issue from my home desktop, I discovered that shared workspace wifi was disrupting TLS/VPNs to control the internet access!
This is what I did to solve the above problem.
I simply ran the following commands::
> gcloud container clusters get-credentials {cluster_name} --zone {zone_name} --project {project_name}
> gcloud auth application-default login
Replace the placeholders appropriately.
So this MAY NOT work for you on GKE, but Azure AKS (managed Kubernetes) has a similar problem with the same error message so who knows — this might be helpful to someone.
The solution to this for me was to scale the nodes in my Cluster from the Azure Kubernetes service blade web console.
Workaround / Solution
Log into the Azure (or GKE) Console — Kubernetes Service UI.
Scale your cluster up by 1 node.
Wait for scale to complete and attempt to connect (you should be able to).
Scale your cluster back down to the normal size to avoid cost increases.
Total time it took me ~2 mins.
More Background Info on the Issue
Added this to the full ticket description write up that I posted over here (if you want more info have a read):
'Unable to connect Net/http: TLS handshake timeout' — Why can't Kubectl connect to Azure AKS server?

K8S dashboard not accessible after first cluster in GKE - GCP using console

Newbie setup :
Created First project in GCP
Created cluster with default, 3 nodes. Node version 1.7.6. cluster master version 1.7.6-gke.1.
Deployed aan application in a pod, per example.
Able to access "hello world" and the hostname, using the external-ip and the port.
In GCP / GKE webpage of my cloud console, clicked "discovery and loadbalancing", I was able to see the "kubernetes-dashboard" process in green-tick, but cannot access throught the IP listed. tried 8001,9090, /ui and nothing worked.
not using any cloud shell or gcloud commands on my local laptop. Everything is done on console.
Questions :
How can anyone access the kubernetes-dashboard of the cluster created in console?
docs are unclear, are the dashboard components incorporated in the console itself? Are the docs out of sync with GCP-GKE screens?
tutorial says run "kubectl proxy" and then to open
"http://localhost:8001/ui", but it doesnt work, why?
If you create a cluster with with version 1.9.x or greater, then u can access using tokens.
get secret.
kubectl -n kube-system describe secrets `kubectl -n kube-system get secrets | awk '/clusterrole-aggregation-controller/ {print $1}'` | awk '/token:/ {print $2}'
Copy secret.
kubectl proxy.
Open UI using 127.0.0.1:8001/ui. This will redirect to login page.
there will be two options to login, kubeconfig and token.
Select token and paste the secret copied earlier.
hope this helps
It seems to be an issue with the internal Kubernetes DNS service starting at version 1.7.6 on Google Cloud.
The solution is to access the dashboard at this endpoint instead:
http://localhost:8001/api/v1/proxy/namespaces/kube-system/services/kubernetes-dashboard
Github Issue links:
https://github.com/kubernetes/dashboard/issues/2368
https://github.com/kubernetes/kubernetes/issues/52729
The address of the dashboard service is only accessible from inside of the cluster. If you ssh into a node in your cluster, you should be able to connect to the dashboard. You can verify this by noticing that the address is within the services CIDR range for your cluster.
The dashboard in running as a pod inside of your cluster with an associated service. If you open the Workloads view you will see the kubernetes-dashboard deployment and can see the pod that was created by the deployment. I'm not sure which docs you are referring to, since you didn't provide a link.
When you run kubectl proxy it creates a secure connection from your local machine into your cluster. It works by connecting to your master and then running through a proxy on the master to the pod/service/host that you are connecting to via an ssh tunnel. It's possible that it isn't working because the ssh tunnels are not running; you should verify that your project has newly created ssh rules allowing access from the cluster endpoint IP address. Otherwise, if you could explain more about how it fails, that would be useful for debugging.
First :
gcloud container clusters get-credentials cluster-1 --zone my-zone --project my-project
Then find your kubernetes dashboard endpoint doing :
kubectl cluster-info
It will be something like https://42.42.42.42/api/v1/namespaces/kube-system/services/kubernetes-dashboard/proxy
Install kube-dashboard
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v1.10.1/src/deploy/recommended/kubernetes-dashboard.yaml
Run:
$ kubectl proxy
Access:
http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#!/login

Kubernetes unable to pull images from gcr.io

I am trying to setup Kubernetes for the first time. I am following the Fedora Manual installation guide: http://kubernetes.io/v1.0/docs/getting-started-guides/fedora/fedora_manual_config.html
I am trying to get the kubernetes addons running , specifically the kube-ui. I created the service and replication controller like so:
kubectl create -f cluster/addons/kube-ui/kube-ui-rc.yaml --namespace=kube-system
kubectl create -f cluster/addons/kube-ui/kube-ui-svc.yaml --namespace=kube-system
When i run
kubectl get events --namespace=kube-system
I see errors such as this:
Failed to pull image "gcr.io/google_containers/pause:0.8.0": image pull failed for gcr.io/google_containers/pause:0.8.0, this may be because there are no credentials on this request. details: (Authentication is required.)
How am i supposed to tell kubernetes to authenticate? This isnt covered in the documentation. So how do i fix this?
This happened due to a recent outage to gce storage as a result of which all of us went through this error while pulling images from gcr (which uses gce storage on the backend).
Are you still seeing this error ?
as the message says, you need credentials. Are you using Google Container Engine? Then you need to run
gcloud config set project <your-project>
gcloud config set compute/zone <your-zone, like us-central1-f>
gcloud beta container clusters get-credentials --cluster <your-cluster-name>
then your GCE cluster will have the credentials