Setting boot disk size for autoscaling kubernetes cluster through Terraform - kubernetes

I am trying to add the boot disk size to the node auto-provisioned Kubernetes cluster as follows:
resource "google_container_cluster" "gc-dev-kube-ds0" {
.
.
.
cluster_autoscaling {
enabled = true
resource_limits {
resource_type = "cpu"
minimum = 4
maximum = 150
}
resource_limits {
resource_type = "memory"
minimum = 4
maximum = 600
}
resource_limits {
resource_type = "nvidia-tesla-v100"
minimum = 0
maximum = 4
}
}
disk_size_gb = 200
}
but I am getting the following error:
Error: Unsupported argument
on kubernetes.tf line 65, in resource "google_container_cluster" "gc-dev-kube-ds0":
65: disk_size_gb = 200
An argument named "disk_size_gb" is not expected here.
Also checked the terraform documentation but nothing is mentioned on this.

The error is getting because the disk_size_gb module must be in the node_config block, as the following.
node_config {
disk_size_gb = 200
}
The TerraForm documentation about google_container_cluster the module needs to be under the block.

Related

Terraform Plan deleting AKS Node Pool

Terraform plan always forces AKS cluster to be recreated if we increase worker node in node pool
Trying Creating AKS Cluster with 1 worker node, via Terraform, it went well , Cluster is Up and running.
Post that, i tried to add one more worker node in my AKS, Terraform Show Plan: 2 to add, 0 to change, 2 to destroy.
Not Sure how can we increase worker node in aks node pool, if it delate the existing node pool.
default_node_pool {
name = var.nodepool_name
vm_size = var.instance_type
orchestrator_version = data.azurerm_kubernetes_service_versions.current.latest_version
availability_zones = var.zones
enable_auto_scaling = var.node_autoscalling
node_count = var.instance_count
enable_node_public_ip = var.publicip
vnet_subnet_id = data.azurerm_subnet.subnet.id
node_labels = {
"node_pool_type" = var.tags[0].node_pool_type
"environment" = var.tags[0].environment
"nodepool_os" = var.tags[0].nodepool_os
"application" = var.tags[0].application
"manged_by" = var.tags[0].manged_by
}
}
Error
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement
Terraform will perform the following actions:
# azurerm_kubernetes_cluster.aks_cluster must be replaced
-/+ resource "azurerm_kubernetes_cluster" "aks_cluster" {
Thanks
Satyam
I tested the same in my environment by creating a cluster with 2 node counts and then changed it to 3 using something like below :
If you are using HTTP_proxy then it will by default force a replacement on that block and that's the reason the whole cluster will get replaced with the new configurations.
So, for a solution you can use lifecycle block in your code as I have done below:
lifecycle {
ignore_changes = [http_proxy_config]
}
The code will be :
resource "azurerm_kubernetes_cluster" "aks_cluster" {
name = "${var.global-prefix}-${var.cluster-id}-${var.envid}-azwe-aks-01"
location = data.azurerm_resource_group.example.location
resource_group_name = data.azurerm_resource_group.example.name
dns_prefix = "${var.global-prefix}-${var.cluster-id}-${var.envid}-azwe-aks-01"
kubernetes_version = var.cluster-version
private_cluster_enabled = var.private_cluster
default_node_pool {
name = var.nodepool_name
vm_size = var.instance_type
orchestrator_version = data.azurerm_kubernetes_service_versions.current.latest_version
availability_zones = var.zones
enable_auto_scaling = var.node_autoscalling
node_count = var.instance_count
enable_node_public_ip = var.publicip
vnet_subnet_id = azurerm_subnet.example.id
}
# RBAC and Azure AD Integration Block
role_based_access_control {
enabled = true
}
http_proxy_config {
http_proxy = "http://xxxx"
https_proxy = "http://xxxx"
no_proxy = ["localhost","xxx","xxxx"]
}
# Identity (System Assigned or Service Principal)
identity {
type = "SystemAssigned"
}
# Add On Profiles
addon_profile {
azure_policy {enabled = true}
}
# Network Profile
network_profile {
network_plugin = "azure"
network_policy = "calico"
}
lifecycle {
ignore_changes = [http_proxy_config]
}
}

How to create a healthy VPC-Native GKE cluster with Terraform?

Through Terraform, I am trying to create a VPC-Native GKE cluster in a single zone (europe-north1-b), with a separate node-pool, with the GKE cluster and node-pool in their own VPC Network.
My code looks like the following:
resource "google_container_cluster" "gke_cluster" {
description = "GKE Cluster for personal projects"
initial_node_count = 1
location = "europe-north1-b"
name = "prod"
network = google_compute_network.gke.self_link
remove_default_node_pool = true
subnetwork = google_compute_subnetwork.gke.self_link
ip_allocation_policy {
cluster_secondary_range_name = local.cluster_secondary_range_name
services_secondary_range_name = local.services_secondary_range_name
}
}
resource "google_compute_network" "gke" {
auto_create_subnetworks = false
delete_default_routes_on_create = false
description = "Compute Network for GKE nodes"
name = "${terraform.workspace}-gke"
routing_mode = "GLOBAL"
}
resource "google_compute_subnetwork" "gke" {
name = "prod-gke-subnetwork"
ip_cidr_range = "10.255.0.0/16"
region = "europe-north1"
network = google_compute_network.gke.id
secondary_ip_range {
range_name = local.cluster_secondary_range_name
ip_cidr_range = "10.0.0.0/10"
}
secondary_ip_range {
range_name = local.services_secondary_range_name
ip_cidr_range = "10.64.0.0/10"
}
}
locals {
cluster_secondary_range_name = "cluster-secondary-range"
services_secondary_range_name = "services-secondary-range"
}
resource "google_container_node_pool" "gke_node_pool" {
cluster = google_container_cluster.gke_cluster.name
location = "europe-north1-b"
name = terraform.workspace
node_count = 1
node_locations = [
"europe-north1-b"
]
node_config {
disk_size_gb = 100
disk_type = "pd-standard"
image_type = "cos_containerd"
local_ssd_count = 0
machine_type = "g1-small"
preemptible = false
service_account = google_service_account.gke_node_pool.email
}
}
resource "google_service_account" "gke_node_pool" {
account_id = "${terraform.workspace}-node-pool"
description = "The default service account for pods to use in ${terraform.workspace}"
display_name = "GKE Node Pool ${terraform.workspace} Service Account"
}
resource "google_project_iam_member" "gke_node_pool" {
member = "serviceAccount:${google_service_account.gke_node_pool.email}"
role = "roles/viewer"
}
However, whenever I apply this Terraform code, I receive the following error:
google_container_cluster.gke_cluster: Still creating... [24m30s elapsed]
google_container_cluster.gke_cluster: Still creating... [24m40s elapsed]
╷
│ Error: Error waiting for creating GKE cluster: All cluster resources were brought up, but: component "kube-apiserver" from endpoint "gke-xxxxxxxxxxxxxxxxxxxx-yyyy" is unhealthy.
│
│ with google_container_cluster.gke_cluster,
│ on gke.tf line 1, in resource "google_container_cluster" "gke_cluster":
│ 1: resource "google_container_cluster" "gke_cluster" {
│
╵
My cluster is then auto-deleted.
I can find no problem with my Terraform code/syntax, and have searched through Google Cloud Logging to find a more detailed error message with no luck.
So, how do I create a HEALTHY VPC-Native GKE cluster with Terraform?
Turns out the issue seemed to be with having the large subnetwork secondary ranges.
As shown in the question, I had ranges:
10.0.0.0/10 for the cluster_secondary_range.
10.64.0.0/10 for the services_secondary_range.
These /10 CIDRs cover 4194304 IP addresses each, which I figured might be too large for Google/GKE to handle(?) - especially since all of the GKE documentation uses CIDRs covering much smaller ranges for the cluster & services.
I decided to shrink these CIDR ranges to see if would help:
10.0.0.0/12 for the cluster_secondary_range.
10.16.0.0/12 for the services_secondary_range.
These /12 CIDRs cover 1048576 IP addresses each.
My cluster was created successfully after this change:
google_container_cluster.gke_cluster: Creation complete after 5m40s
Not sure WHY Google / GKE can't handle larger CIDR ranges for the cluster & services, but /12 is good enough for me and allows for successful creation of the cluster.

How to create routing for kubernetes with nginx ingress in terraform - scaleway

I have a kubernetes setup with a cluster and two pools (nodes), I have also setup an (nginx) ingress server for kubernetes with helm. All of this is written in terraform for scaleway. What I am struggling with is how to config the ingress server to route to my kubernetes pools/nodes depending on the url path.
For example, I want [url]/api to go to my scaleway_k8s_pool.api and [url]/auth to go to my scaleway_k8s_pool.auth.
This is my terraform code
provider "scaleway" {
zone = "fr-par-1"
region = "fr-par"
}
resource "scaleway_registry_namespace" "main" {
name = "main_container_registry"
description = "Main container registry"
is_public = false
}
resource "scaleway_k8s_cluster" "main" {
name = "main"
description = "The main cluster"
version = "1.20.5"
cni = "calico"
tags = ["i'm an awsome tag"]
autoscaler_config {
disable_scale_down = false
scale_down_delay_after_add = "5m"
estimator = "binpacking"
expander = "random"
ignore_daemonsets_utilization = true
balance_similar_node_groups = true
expendable_pods_priority_cutoff = -5
}
}
resource "scaleway_k8s_pool" "api" {
cluster_id = scaleway_k8s_cluster.main.id
name = "api"
node_type = "DEV1-M"
size = 1
autoscaling = true
autohealing = true
min_size = 1
max_size = 5
}
resource "scaleway_k8s_pool" "auth" {
cluster_id = scaleway_k8s_cluster.main.id
name = "auth"
node_type = "DEV1-M"
size = 1
autoscaling = true
autohealing = true
min_size = 1
max_size = 5
}
resource "null_resource" "kubeconfig" {
depends_on = [scaleway_k8s_pool.api, scaleway_k8s_pool.auth] # at least one pool here
triggers = {
host = scaleway_k8s_cluster.main.kubeconfig[0].host
token = scaleway_k8s_cluster.main.kubeconfig[0].token
cluster_ca_certificate = scaleway_k8s_cluster.main.kubeconfig[0].cluster_ca_certificate
}
}
output "cluster_url" {
value = scaleway_k8s_cluster.main.apiserver_url
}
provider "helm" {
kubernetes {
host = null_resource.kubeconfig.triggers.host
token = null_resource.kubeconfig.triggers.token
cluster_ca_certificate = base64decode(
null_resource.kubeconfig.triggers.cluster_ca_certificate
)
}
}
resource "helm_release" "ingress" {
name = "ingress"
chart = "ingress-nginx"
repository = "https://kubernetes.github.io/ingress-nginx"
namespace = "kube-system"
}
How would i go about configuring the nginx ingress server for routing to my kubernetes pools?

Is it possible to create a zone only node pool in a regional cluster in GKE?

I have a regional cluster for redundancy. In this cluster I want to create a node-pool in just 1 zone in this region. Is this configuration possible? reason I trying this is, I want to run service like RabbitMQ in just 1 zone to avoid split, and my application services running on all zones in the region for redundancy.
I am using terraform to create the cluster and node pools, below is my config for creating region cluster and zone node pool
resource "google_container_cluster" "regional_cluster" {
provider = google-beta
project = "my-project"
name = "my-cluster"
location = "us-central1"
node_locations = ["us-central1-a", "us-central1-b", "us-central1-c"]
master_auth {
username = ""
password = ""
client_certificate_config {
issue_client_certificate = false
}
}
}
resource "google_container_node_pool" "one_zone" {
project = google_container_cluster.regional_cluster.project
name = "zone-pool"
location = "us-central1-b"
cluster = google_container_cluster.regional_cluster.name
node_config {
machine_type = var.machine_type
image_type = var.image_type
disk_size_gb = 100
disk_type = "pd-standard"
}
}
This throws an error message
error creating NodePool: googleapi: Error 404: Not found: projects/my-project/zones/us-central1-b/clusters/my-cluster., notFound
Found out that location in google_container_node_pool should specify cluster master's region/zone. To actually specify the node-pool location node_locations should be used. Below is the config that worked
resource "google_container_cluster" "regional_cluster" {
provider = google-beta
project = "my-project"
name = "my-cluster"
location = "us-central1"
node_locations = ["us-central1-a", "us-central1-b", "us-central1-c"]
master_auth {
username = ""
password = ""
client_certificate_config {
issue_client_certificate = false
}
}
}
resource "google_container_node_pool" "one_zone" {
project = google_container_cluster.regional_cluster.project
name = "zone-pool"
location = google_container_cluster.regional_cluster.location
node_locations = ["us-central1-b"]
cluster = google_container_cluster.regional_cluster.name
node_config {
machine_type = var.machine_type
image_type = var.image_type
disk_size_gb = 100
disk_type = "pd-standard"
}
}

Autoscaling GKE node pool stuck at 0 instances even with autoscaling set at min 3 max 5?

I've created a cluster using terraform with:
provider "google" {
credentials = "${file("gcp.json")}"
project = "${var.gcp_project}"
region = "us-central1"
zone = "us-central1-c"
}
resource "google_container_cluster" "primary" {
name = "${var.k8s_cluster_name}"
location = "us-central1-a"
project = "${var.gcp_project}"
# We can't create a cluster with no node pool defined, but we want to only use
# separately managed node pools. So we create the smallest possible default
# node pool and immediately delete it.
remove_default_node_pool = true
initial_node_count = 1
master_auth {
username = ""
password = ""
client_certificate_config {
issue_client_certificate = false
}
}
}
resource "google_container_node_pool" "primary_preemptible_nodes" {
project = "${var.gcp_project}"
name = "my-node-pool"
location = "us-central1-a"
cluster = "${google_container_cluster.primary.name}"
# node_count = 3
autoscaling {
min_node_count = 3
max_node_count = 5
}
node_config {
# preemptible = true
machine_type = "g1-small"
metadata = {
disable-legacy-endpoints = "true"
}
oauth_scopes = [
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
"https://www.googleapis.com/auth/devstorage.read_only"
]
}
}
Surprisingly this node pool seems to be 'stuck' at 0 instances? Why? How can I diagnose this?
you should add "initial_node_count" (like initial_node_count = 3) to "google_container_node_pool" resourse.
Official documentation says you should not to use "node_count" with "autoscaling".