Google Container Engine New cluster appears to have failed - kubernetes

I tried to create a new cluster in Container Engine in the Google Developers Console.
It finished pretty quickly with a yellow triangle with an exclamation point. I'm assuming that means it didn't work.
Any idea what I could be doing wrong?

There's a few things that could go wrong. The best option to figure out what's wrong in your situation is to try using the gcloud command line tool, which gives better error information. Information about how to install and use it is in Container Engine's documentation.
Other than the default network being removed (as mentioned by Robert Bailey), you may be trying to create more VM instances than you have quota for. You can check what your quota is on the developer console under Compute > Compute Engine > Quota. You're most likely to go over quota on either CPUs or in-use IP addresses, since each VM created is given an ephemeral IP address.

Have you deleted your default network?
The alpha version of Container Engine relies on the default network when creating VMs and routes between the nodes and you will see an error creating a cluster if you have deleted the default network.

Related

Is there a way to enable nested virtualization in GKE cluster node?

I am trying to use KubeVirt with GKE cluster.
I found I am able to create a nested virtualization enabled GCP VM, but I didn't find a way to achieve the same thing for GKE cluster node.
If I cannot enable nested virtualization for GKE cluster node, I can only use the kubevirt with debug.useEmulation which is not what I want.
Thanks
Yes you can -- it isn't even hard to do, it just isn't very intuitive.
Start a GKE cluster with ubuntu/containerd, n1-standard nodes and minimum cpu of Haswell. I think you also need to enable "Basic Authorization" to get virtctl working (sorry).
Find the template used for your new cluster, then to determine the proper source image:
gcloud compute instance-templates describe --format=json | jq ".properties.disks[0].initializeParams.sourceImage"
Create a copy of the source disk with nested virtualization enabled:
gcloud compute images --project $PROJECT create $NEW_IMAGE_NAME --source-image $SOURCE_IMAGE --source-image-project=$SOURCE_PROJECT --licenses "https://www.googleapis.com/compute/v1/projects/vm-options/global/licenses/enable-vmx"
Use "Create Similar" on the template for your GKE cluster. Change the boot disk to $NEW_IMAGE_NAME. You will also need to drill down to networking/alias and change the default subnet to your pod network.
Trigger a rolling update on the group for your GKE nodes to move them to the new template.
You can now install kubevirt (I had to use 0.38.1 instead of the current)
Caveats: I don't know how to use google disk images for kubevirt which would be an obvious match. I haven't even figured out how to get private GCR working with CDI. Oh, and console doesn't work due to websocket problems. But... you can shell to a gke node and see /dev/kvm, you can also kubevirt a VM then ssh into it, so yes, it does work.
Anyone know how to make any of this better?
Currently nested virtualization is available only on GCE as per this docs.
There is already question regarding supporting Nested Virtualization on GKE and it can be found here. I'd say it's not introduced yet, thats why you cannot find proper documentation about GKE and nested virtualization.
Also please consider that GCP and GKE are quite different.
Google Compute Engine VM instance is unmanaged by google. So besides ready base image, you can do whatever you need, like it would be normal VM.
However, Google Kubernetes Engine was created especially for containers. Thoses VMs are managed by google. GKE already creates Cluster for you and all VMs are automatically part of the cluster. In GKE you are unable to run Minikube or Kubeadm.
Here you have some characteristics of GKE

Ephemeral Storage usage in AKS

I have a simple 3-node cluster created using AKS. Everything has been going fine for 3 months. However, I'm starting to have some disk space usage issues that seem related to the Os disks attached to each nodes.
I have no error in kubectl describe node and all disk-related checks are fine. However, when I try to run kubectl logs on some pods, I sometimes get "no space left on device".
How can one manage storage used in those disks? I can't seem to find a way to SSH into those nodes as it seems to only be manageable via Azure CLI / web interface. Is there also a way to clean what takes up this space (I assume unused docker images would take place, but I was under the impression that those would get cleaned automatically...)
Generally, the AKS nodes just run the pods or other resources for you, the data is stored in other space just like remote storage server. In Azure, it means managed disks and Azure file Share. You can also store the growing data in the nodes, but you need to configure big storage for each node and I don't think it's a good way.
To SSH into the AKS nodes, there are ways. One is that set the NAT rule manually for the node which you want to SSH into in the load balancer. Another is that create a pod as the jump box and the steps here.
The last point is that the AKS will delete the unused images regularly and automatically. It's not recommended to delete the unused images manually.
Things you can do to fix this:
Create AKS with bigger OS disk (I usually use 128gb)
Upgrade AKS to a newer version (this would replace all the existing vms with new ones, so they won't have stale docker images on them)
Manually clean up space on nodes
Manually extend OS disk on nodes (will only work until you scale\upgrade the cluster)
I'd probably go with option 1, else this problem would haunt you forever :(

GKE autoscaling doesn't scale

I am setting up a Kubernetes cluster on Google using the Google Kubernetes Engine. I have created the cluster with auto-scaling enabled on my nodepool.
As far as I understand this should be enough for the cluster to spin up extra nodes if needed.
But when I run some load on my cluster, the HPA is activated and wants to spin up some extra instances but can't deploy them due to 'insufficient cpu'. At this point I expected the auto-scaling of the cluster to kick into action but it doesn't seem to scale up. I did however see this:
So the node that is wanting to be created (I guess thanks to the auto-scaler?) can't be created with following message: Quota 'IN_USE_ADDRESSES' exceeded. Limit: 8.0 in region europe-west1.
I also didn't touch the auto-scaling on the instance group, so when running gcloud compute instance-groups managed list, it shows as 'autoscaled: no'
So any help getting this autoscaling to work would be appreciated.
TL;DR I guess the reason it isn't working is: Quota 'IN_USE_ADDRESSES' exceeded. Limit: 8.0 in region europe-west1, but I don't know how I can fix it.
You really have debugged it yourself already. You need to edit the Quotas on the GCP Console. Make sure you select the correct project. Increase all that are low: probably addresses and CPUs in the zone. This process is semi automated only, so you might need to wait a bit and possibly pay a deposit.

Choosing a different vm type for cluster master and resizing number of nodes

How can I specify a specific vm type for the cluster master (I don't want to use an high memory instance for relative an inactive node).
Also, is there any way to add nodes to a cluster and choosing the type of vm? (this can solve the first problem)
Update November 2015:
Now that Google Container Engine is no longer in alpha, you don't need to worry about the size of your cluster master, as it is part of the managed service.
You can now easily add/remove nodes from your cluster through the cloud console UI but they will all be the same machine type that you originally choose for your cluster.
If you are running OSS Kubernetes on GCE, then you can set the MASTER_SIZE environment variable in cluster/gce/config-default.sh before creating your cluster.
If you are running on GKE, we unfortunately don't yet offer the option to customize the size of your master differently than the size of your nodes. We hope to offer more flexibility in cluster provisioning soon.
There is currently not a way to resize your cluster after you create it. I'm actually working on this for OSS Kubernetes in Issue #3168.

What image does Google Container Engine (GKE) use?

In the docs for GKE it says all nodes (currently) have the same VM instance. Does this refer to the underlying machine type or the OS image (or both)?
I was assuming it was just the machine type (micro, small,.. etc) and Google layered their own image with infrastructure on top of that (e.g. kubernetes).
If this is the case what image does Google use on GKE? I was thinking it may be CoreOS, since that would seem to be a good match, but I am not sure.
I'd like to set up staging machines with the same image as production... but perhaps we don't need to know this or it doesn't matter what is used.
All nodes in the cluster currently have the same machine type and OS image. By default, the machine type is n1-standard-1 and the image is a recent container-vm image.
If you use gcloud to create your cluster, both settings can be overridden on the command line using the --machine-type and --source-image options respectively (documentation).
If you are using the cloud console to create your cluster, you can specify the machine type but not currently the source image.
Be aware that if you specify a different source image, you may not end up with a functional cluster because the kubernetes software that is installed on top of the source image requires specific underlying packages to be present in the system software. If you want consistency between staging/prod, you can use
gcloud container clusters describe <staging-cluster-name>
To see what image is being used in your staging cluster and ensure that you end up with the same image for your production cluster.