Missing clusters after resolving GCP billing.
Gcp project had unresolved billing.
This led to instances and k8 resources unaccessible.
We resolved the billing.
All instances and (k8) instance groups came back.
Problem the clusters are nowhere to be seen.
gcloud container clusters list
*empty*
Before this we had services running.
All loadbalancers are visible under network services
How do I rebuild the cluster from the instances?
By any chance was the unresolved billing related to the end of the GCP free trial?
If this was the case, if this was less than 30 days, there is still a good chance of getting your clusters back. Since at then end of the free trial all services are stopped and sent for deletion, as outlined here. But there is a grace period of 30 days so these services can be recovered
If this is still before the 30 days, and you have upgraded to a paid GCP plan and the clusters have not showed up, I would indeed recommend to contact GCP support to see what is going on
Related
I'm trying to run open-source with minimal costs on the cloud and would love to run it on k8s without the hassle of managing it (managed k8s cluster). Is there a free tier option for a small-scale project in any cloud provider?
If there is one, which parameters should I choose to get the free tier?
You can use IBM cloud which provides a single worker node Kubernetes cluster along with container registry like other cloud providers. This is more than enough for a beginner to try the concepts of Kubernetes.
You can also use Tryk8s which provides a playground for trying Kubernetes for free. Play with Kubernetes is a labs site provided by Docker and created by Tutorius. Play with Kubernetes is a playground which allows users to run K8s clusters in a matter of seconds. It gives the experience of having a free Alpine Linux Virtual Machine in the browser. Under the hood Docker-in-Docker (DinD) is used to give the effect of multiple VMs/PCs.
If you want to use more services and resources, based on your use case you can try other cloud providers, they may not provide an indefinitely free trial but have no restriction on the resources.
For Example, Google Kubernetes engine(GKE) provides $300 credit to fully explore and conduct an assessment of Google Cloud. You won’t be charged until you upgrade which can be used for a 3 month period from the account creation. There is no restriction on the resources and the number of nodes for creating a cluster. You can add Istio and Try Cloud Run (Knative) also.
Refer Free Kubernetes which Lists the free Trials/Credit for Managed Kubernetes Services.
All cluster resources were brought up, but: only 0 nodes out of 4 have registered; this is likely due to Nodes failing to start correctly; check VM Serial logs for errors, try re-creating the cluster or contact support if that doesn't work.
I've tried recreating it several times in different zones figuring google was doing maintenance. I've also tried with fewer node sizes with no luck.
My Kubernetes Engine cluster keeps rebooting one of my nodes, even though all pods on the node are "well-behaved". I've tried to look at the cluster's Stackdriver logs, but was not able to find a reason. After a while, the continuous reboots usually stop, only to occur again a few hours or days later.
Usually only one single node is affected, while the other nodes are fine, but deleting that node and creating a new one in its place only helps temporarily.
I have already disabled node auto-repair to see if that makes a difference (it was turned on before), and if I recall correctly this started after upgrading my cluster to Kubernetes 1.13 (specifically version 1.13.5-gke). The issue has persisted after upgrading to 1.13.6-gke.0. Even creating a new node pool and migrating to it had no effect.
The cluster consists of four nodes with 1 CPU and 3 GB RAM each. I know that's small for a k8s cluster, but this has worked fine in the past.
I am using the new Stackdriver Kubernetes Monitoring as well as Istio on GKE.
Any pointers as to what could be the reason or where I look for possible causes would be appreciated.
Screenshots of the Node event list (happy to provide other logs; couldn't find anything meaningful in Stackdriver Logging yet):
Posting this answer as a community wiki to give some troubleshooting tips/steps as the underlying issue wasn't found.
Feel free to expand it.
After below steps, the issue with a node rebooting were not present anymore:
Updated the Kubernetes version (GKE)
Uninstalling Istio
Using e2-medium instances as nodes.
As pointed by user #aurelius:
I would start from posting the kubectl describe node maybe there is something going on before your Node gets rebooted and unhealthy. Also do you use resources and limits? Can this restarts be a result of some burstable workload? Also have you tried checking system logs after the restart on the Node itself? Can you post the results? – aurelius Jun 7 '19 at 15:38
Above comment could be a good starting point for troubleshooting issues with the cluster.
Options to troubleshoot the cluster pointed in comment:
$ kubectl describe node focusing on output in:
Conditions - KubeletReady, KubeletHasSufficientMemory, KubeletHasNoDiskPressure, etc.
Allocated resources - Requests and Limits of scheduled workloads
Checking system logs after the restart on the node itself:
GCP Cloud Console (Web UI) -> Logging -> Legacy Logs Viewer/Logs Explorer -> VM Instance/GCE Instance
It could be also beneficiary to check the CPU/RAM usage in:
GCP Cloud Console (Web UI) -> Monitoring -> Metrics Explorer
You can also check if there are any operations on the cluster:
gcloud container operations list
Adding to above points:
Creating a cluster with Istio on GKE
We suggest creating at least a 4 node cluster with the 2 vCPU machine type when using this add-on. You can deploy Istio itself with the default GKE new cluster setup but this may not provide enough resources to explore sample applications.
-- Cloud.google.com: Istio: Docs: Istio on GKE: Installing
Also, the official docs of Istio are stating:
CPU and memory
Since the sidecar proxy performs additional work on the data path, it consumes CPU and memory. As of Istio 1.7, a proxy consumes about 0.5 vCPU per 1000 requests per second.
-- Istio.io: Docs: Performance and scalability: CPU and memory
Additional resources:
Cloud.google.com: Kubernetes Engine: Docs: Troubleshooting
Kubernetes.io: Docs: Debug cluster
I have a GKE cluster running in us-central1 with a preemptable node pool. I have nodes in each zone (us-central1-b,us-central1-c,us-central1-f). For the last 10 hours, I get the following error for the underlying node vm:
Instance '[instance-name]' creation failed: The zone
'[instance-zone]'
does not have enough resources available to fulfill
the request. Try a different zone, or try again
later.
I tried creating new clusters in different regions with different machine types, using HA (multi-zone) settings and I get the same error for every cluster.
I saw an issue on Google Cloud Status Dashboard and tried with the console, as recommended, and it errors out with a timeout error.
Is anyone else having this problem? Any idea what I may be dong wrong?
UPDATES
Nov 11
I stood up a cluster in us-west2, this was the only one which would work. I used gcloud command line, it seems the UI was not effective. There was a note similar to this situation, use gcloud not ui, on the Google Cloud Status Dashboard.
I tried creating node pools in us-central1 with the gcloud command line, and ui, to no avail.
I'm now federating deployments across regions and standing up multi-region ingress.
Nov. 12
Cannot create HA clusters in us-central1; same message as listed above.
Reached out via twitter and received a response.
Working with the K8s guide to federation to see if I can get multi-cluster running. Most likely going to use Kelsey Hightowers approach
Only problem, can't spin up clusters to federate.
Findings
Talked with google support, need a $150/mo. package to get a tech person to answer my questions.
Preemptible instances are not a good option for a primary node pool. I did this because I'm cheap, it bit me hard.
The new architecture is a primary node pool with committed use VMs that do not autoscale, and a secondary node pool with preemptible instances for autoscale needs. The secondary pool will have minimum nodes = 0 and max nodes = 5 (for right now); this cluster is regional so instances are across all zones.
Cost for an n1-standard-1 sustained use (assuming 24/7) a 30% discount off list.
Cost for a 1-year n1-standard-1 committed use is about ~37% discount off list.
Preemptible instances are re-provisioned every 24hrs., if they are not taken from you when resource needs spike in the region.
I believe I fell prey to a resource spike in the us-central1.
A must-watch for people looking to federate K8s: Kelsey Hightower - CNCF Keynote | Kubernetes Federation
Issue appears to be resolved as of Nov 13th.
I have a micro service scaled out across several pods in a Google Cloud Kubernetes Engine. Being in a multi-cloud-shop, we have our logging/monitoring/telemetry in Azure Application Insights.
Our data should be kept inside Europe, so our GCP Kubernetes cluster is set up with
Master zone: europe-west1-b
Node zones: europe-west1-b
When I create a node pool on this cluster, the nodes apparently has the zone europe-west1-b (as expected), seen from the Google Cloud Platform Console "Node details".
However, in Azure Application Insights, from the telemetry reported from the applications running in pods in this node pool, the client_City is reported as "Mountain View" and client_StateOrProvince is "California", and some cases "Ann Arbor" in "Michigan".
At first I waived this strange location as just some inter-cloud-issue (e.g. defaulting to something strange when not filling out the information as expected on the receiving end, or similar).
But now, Application Insights actually pointed out that there is a quite significant performance difference depending on if my pod is running in Michigan or in California, which lead me to belive that these fields are actually correct.
Is GCP fooling me? Am I looking at the wrong place? How can I make sure my GCP Kubernetes nodes are running in Europe?
This is essential for me to know, both from a GCPR perspective, and of course performance (latency) wise.
the Azure Application Insights are fooling you, because the external IP was registered by Google in California, not considering that these are used by data-centers distributed all over the globe. also have a GCE instance deployed to Frankfurt am Main, while the IP appears as if it would be Mountain View. StackDriver might report the actual locations (and not some vague GeoIP locations).