Managing GCP Composer Kubernetes cluster using Terraform - kubernetes

I've created a GCP Composer environment using Terraform:
terraform {
required_providers {
google = {
source = "hashicorp/google"
version = "3.5.0"
}
}
}
provider "google" {
credentials = file("my_key.json")
project = "my_project_id"
region = "us-east1"
zone = "us-east1-b"
}
resource "google_composer_environment" "my_composer_id" {
name = "my_composer_name"
region = "us-east1"
config {
node_count = 3
node_config {
zone = "us-east1-b"
machine_type = "n1-standard-1"
}
}
}
Composer also automatically creates a Kubernetes Engine cluster. Such cluster has a single node pool called default-pool. I'd like to create a new node pool inside the cluster created by Composer. Something like this:
resource "google_container_node_pool" "my_node_pool_id" {
name = "my_node_pool_name"
location = "us-east1"
cluster = ????
node_count = 0
node_config {
preemptible = true
machine_type = "n1-standard-1"
}
autoscaling {
min_node_count = 0
max_node_count = 3
}
}
However, as I didn't create the cluster in the Terraform file (as it was automatically created by Composer), I don't have the reference to it.

Cluster name can be accessed via the key gke_cluster available in the config section of your Cloud Composer environment :
resource "google_container_node_pool" "my_node_pool_id" {
name = "my_node_pool_name"
location = "us-east1-b"
cluster = element(
split("/",
lookup(
google_composer_environment.my_composer_id.config[0],
"gke_cluster"
)
),
5
)
// ...
}
The 5th element corresponds to the name of the GKE cluster.

Related

Automated .kube/config file created with Terraform GKE module doesn't work

We are using the below terraform module to create the GKE cluster and the local config file. But the kubectl command doesn't work. The terminal just keeps on waiting. Then we have to manually run the command gcloud container clusters get-credentials to fetch and set up the local config credentials post the cluster creation, and then the kubectl command works. It doesn't feel elegant to execute the gcloud command after the terraform setup. Therefore please help to correct what's wrong.
terraform {
required_providers {
google = {
source = "hashicorp/google"
version = ">= 3.42.0"
}
google-beta = {
source = "hashicorp/google-beta"
version = ">= 3.0.0"
}
}
}
provider "google" {
credentials = var.gcp_creds
project = var.project_id
region = var.region
zone = var.zone
}
provider "google-beta" {
credentials = var.gcp_creds
project = var.project_id
region = var.region
zone = var.zone
}
module "gke" {
source = "terraform-google-modules/kubernetes-engine/google//modules/beta-private-cluster"
project_id = var.project_id
name = "cluster"
regional = true
region = var.region
zones = ["${var.region}-a", "${var.region}-b", "${var.region}-c"]
network = module.vpc.network_name
subnetwork = module.vpc.subnets_names[0]
ip_range_pods = "gke-pod-ip-range"
ip_range_services = "gke-service-ip-range"
horizontal_pod_autoscaling = true
enable_private_nodes = true
master_ipv4_cidr_block = "${var.control_plane_cidr}"
node_pools = [
{
name = "node-pool"
machine_type = "${var.machine_type}"
node_locations = "${var.region}-a,${var.region}-b,${var.region}-c"
min_count = "${var.node_pools_min_count}"
max_count = "${var.node_pools_max_count}"
disk_size_gb = "${var.node_pools_disk_size_gb}"
auto_repair = true
auto_upgrade = true
},
]
}
# Configure the authentication and authorisation of the cluster
module "gke_auth" {
source = "terraform-google-modules/kubernetes-engine/google//modules/auth"
depends_on = [module.gke]
project_id = var.project_id
location = module.gke.location
cluster_name = module.gke.name
}
# Define a local file that will store the necessary info such as certificate, user and endpoint to access the cluster
resource "local_file" "kubeconfig" {
content = module.gke_auth.kubeconfig_raw
filename = "~/.kube/config"
}
UPDATE:
We found that the automated .kube/config file was created with the wrong public IP of the API Server whereas the one created with the gcloud command contains the correct IP of the API Server. Any guesses why terraform one is fetching the wrong public IP?

Terraform Plan deleting AKS Node Pool

Terraform plan always forces AKS cluster to be recreated if we increase worker node in node pool
Trying Creating AKS Cluster with 1 worker node, via Terraform, it went well , Cluster is Up and running.
Post that, i tried to add one more worker node in my AKS, Terraform Show Plan: 2 to add, 0 to change, 2 to destroy.
Not Sure how can we increase worker node in aks node pool, if it delate the existing node pool.
default_node_pool {
name = var.nodepool_name
vm_size = var.instance_type
orchestrator_version = data.azurerm_kubernetes_service_versions.current.latest_version
availability_zones = var.zones
enable_auto_scaling = var.node_autoscalling
node_count = var.instance_count
enable_node_public_ip = var.publicip
vnet_subnet_id = data.azurerm_subnet.subnet.id
node_labels = {
"node_pool_type" = var.tags[0].node_pool_type
"environment" = var.tags[0].environment
"nodepool_os" = var.tags[0].nodepool_os
"application" = var.tags[0].application
"manged_by" = var.tags[0].manged_by
}
}
Error
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement
Terraform will perform the following actions:
# azurerm_kubernetes_cluster.aks_cluster must be replaced
-/+ resource "azurerm_kubernetes_cluster" "aks_cluster" {
Thanks
Satyam
I tested the same in my environment by creating a cluster with 2 node counts and then changed it to 3 using something like below :
If you are using HTTP_proxy then it will by default force a replacement on that block and that's the reason the whole cluster will get replaced with the new configurations.
So, for a solution you can use lifecycle block in your code as I have done below:
lifecycle {
ignore_changes = [http_proxy_config]
}
The code will be :
resource "azurerm_kubernetes_cluster" "aks_cluster" {
name = "${var.global-prefix}-${var.cluster-id}-${var.envid}-azwe-aks-01"
location = data.azurerm_resource_group.example.location
resource_group_name = data.azurerm_resource_group.example.name
dns_prefix = "${var.global-prefix}-${var.cluster-id}-${var.envid}-azwe-aks-01"
kubernetes_version = var.cluster-version
private_cluster_enabled = var.private_cluster
default_node_pool {
name = var.nodepool_name
vm_size = var.instance_type
orchestrator_version = data.azurerm_kubernetes_service_versions.current.latest_version
availability_zones = var.zones
enable_auto_scaling = var.node_autoscalling
node_count = var.instance_count
enable_node_public_ip = var.publicip
vnet_subnet_id = azurerm_subnet.example.id
}
# RBAC and Azure AD Integration Block
role_based_access_control {
enabled = true
}
http_proxy_config {
http_proxy = "http://xxxx"
https_proxy = "http://xxxx"
no_proxy = ["localhost","xxx","xxxx"]
}
# Identity (System Assigned or Service Principal)
identity {
type = "SystemAssigned"
}
# Add On Profiles
addon_profile {
azure_policy {enabled = true}
}
# Network Profile
network_profile {
network_plugin = "azure"
network_policy = "calico"
}
lifecycle {
ignore_changes = [http_proxy_config]
}
}

Is it possible to create a zone only node pool in a regional cluster in GKE?

I have a regional cluster for redundancy. In this cluster I want to create a node-pool in just 1 zone in this region. Is this configuration possible? reason I trying this is, I want to run service like RabbitMQ in just 1 zone to avoid split, and my application services running on all zones in the region for redundancy.
I am using terraform to create the cluster and node pools, below is my config for creating region cluster and zone node pool
resource "google_container_cluster" "regional_cluster" {
provider = google-beta
project = "my-project"
name = "my-cluster"
location = "us-central1"
node_locations = ["us-central1-a", "us-central1-b", "us-central1-c"]
master_auth {
username = ""
password = ""
client_certificate_config {
issue_client_certificate = false
}
}
}
resource "google_container_node_pool" "one_zone" {
project = google_container_cluster.regional_cluster.project
name = "zone-pool"
location = "us-central1-b"
cluster = google_container_cluster.regional_cluster.name
node_config {
machine_type = var.machine_type
image_type = var.image_type
disk_size_gb = 100
disk_type = "pd-standard"
}
}
This throws an error message
error creating NodePool: googleapi: Error 404: Not found: projects/my-project/zones/us-central1-b/clusters/my-cluster., notFound
Found out that location in google_container_node_pool should specify cluster master's region/zone. To actually specify the node-pool location node_locations should be used. Below is the config that worked
resource "google_container_cluster" "regional_cluster" {
provider = google-beta
project = "my-project"
name = "my-cluster"
location = "us-central1"
node_locations = ["us-central1-a", "us-central1-b", "us-central1-c"]
master_auth {
username = ""
password = ""
client_certificate_config {
issue_client_certificate = false
}
}
}
resource "google_container_node_pool" "one_zone" {
project = google_container_cluster.regional_cluster.project
name = "zone-pool"
location = google_container_cluster.regional_cluster.location
node_locations = ["us-central1-b"]
cluster = google_container_cluster.regional_cluster.name
node_config {
machine_type = var.machine_type
image_type = var.image_type
disk_size_gb = 100
disk_type = "pd-standard"
}
}

Autoscaling GKE node pool stuck at 0 instances even with autoscaling set at min 3 max 5?

I've created a cluster using terraform with:
provider "google" {
credentials = "${file("gcp.json")}"
project = "${var.gcp_project}"
region = "us-central1"
zone = "us-central1-c"
}
resource "google_container_cluster" "primary" {
name = "${var.k8s_cluster_name}"
location = "us-central1-a"
project = "${var.gcp_project}"
# We can't create a cluster with no node pool defined, but we want to only use
# separately managed node pools. So we create the smallest possible default
# node pool and immediately delete it.
remove_default_node_pool = true
initial_node_count = 1
master_auth {
username = ""
password = ""
client_certificate_config {
issue_client_certificate = false
}
}
}
resource "google_container_node_pool" "primary_preemptible_nodes" {
project = "${var.gcp_project}"
name = "my-node-pool"
location = "us-central1-a"
cluster = "${google_container_cluster.primary.name}"
# node_count = 3
autoscaling {
min_node_count = 3
max_node_count = 5
}
node_config {
# preemptible = true
machine_type = "g1-small"
metadata = {
disable-legacy-endpoints = "true"
}
oauth_scopes = [
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
"https://www.googleapis.com/auth/devstorage.read_only"
]
}
}
Surprisingly this node pool seems to be 'stuck' at 0 instances? Why? How can I diagnose this?
you should add "initial_node_count" (like initial_node_count = 3) to "google_container_node_pool" resourse.
Official documentation says you should not to use "node_count" with "autoscaling".

Terraform - staggered provider population

I have been looking at implementing Kubernetes with Terraform over the past week and I seem to have a lifecycle issue.
While I can make a Kubernetes resource depend on a cluster being spun up, the KUBECONFIG file isn't updated in the middle of the terraform apply.
The kubernete
resource "kubernetes_service" "example" {
...
depends_on = ["digitalocean_kubernetes_cluster.example"]
}
resource "digitalocean_kubernetes_cluster" "example" {
name = "example"
region = "${var.region}"
version = "1.12.1-do.2"
node_pool {
name = "woker-pool"
size = "s-1vcpu-2gb"
node_count = 1
}
provisioner "local-exec" {
command = "sh ./get-kubeconfig.sh" // gets KUBECONFIG file from digitalocean API.
environment = {
digitalocean_kubernetes_cluster_id = "${digitalocean_kubernetes_cluster.k8s.id}"
digitalocean_kubernetes_cluster_name = "${digitalocean_kubernetes_cluster.k8s.name}"
digitalocean_api_token = "${var.digitalocean_token}"
}
}
While I can pull the CONFIG file down using the API, terraform won't use this file, because the terraform plan is already in motion
I've seen some examples using ternary operators (resource ? 1 : 0) but I haven't found a workaround for non count created clusters besides -target
Ideally, I'd like to create this with one terraform repo.
It turns out that the digitalocean_kubernetes_cluster resource has an attribute which can be passed to the provider "kubernetes" {} like so:
resource "digitalocean_kubernetes_cluster" "k8s" {
name = "k8s"
region = "${var.region}"
version = "1.12.1-do.2"
node_pool {
name = "woker-pool"
size = "s-1vcpu-2gb"
node_count = 1
}
}
provider "kubernetes" {
host = "${digitalocean_kubernetes_cluster.k8s.endpoint}"
client_certificate = "${base64decode(digitalocean_kubernetes_cluster.k8s.kube_config.0.client_certificate)}"
client_key = "${base64decode(digitalocean_kubernetes_cluster.k8s.kube_config.0.client_key)}"
cluster_ca_certificate = "${base64decode(digitalocean_kubernetes_cluster.k8s.kube_config.0.cluster_ca_certificate)}"
}
It results in one provider being dependant on the other, and acts accordingly.