Terraform failed to destroy with kubernetes autoscaler - kubernetes

I have been using following snippet to manage kubernetes auto scaling with terraform
resource "helm_release" "cluster-autoscaler" {
depends_on = [
module.eks
]
name = "cluster-autoscaler"
namespace = local.k8s_service_account_namespace
repository = "https://kubernetes.github.io/autoscaler"
chart = "cluster-autoscaler"
version = "9.10.7"
create_namespace = false
While all of this has been in working state for months (Gitlab CI/CD), it has suddenly stopped working and throwing following error.
module.review_vpc.helm_release.cluster-autoscaler: Refreshing state... [id=cluster-autoscaler]
╷
│ Error: Kubernetes cluster unreachable: exec plugin: invalid apiVersion "client.authentication.k8s.io/v1alpha1"
│
│ with module.review_vpc.helm_release.cluster-autoscaler,
│ on ..\..\modules\aws\eks.tf line 319, in resource "helm_release" "cluster-autoscaler":
│ 319: resource "helm_release" "cluster-autoscaler" {
I am using AWS EKS with kubernetes version 1.21.
The terraform providers are as follows
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 3.0"
}
kubectl = {
source = "gavinbunney/kubectl"
version = "1.14.0"
}
}
UPDATE 1
Here is the module for eks
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "17.24.0"

I had to do couple of changes to terraform scripts (not sure whey they were not required earlier).
Added helm to required_providers section
helm = {
source = "hashicorp/helm"
version = "2.3.0"
}
Replaced token generation from
exec {
api_version = "client.authentication.k8s.io/v1alpha1"
args = ["eks", "get-token", "--cluster-name", var.eks_cluster_name]
command = "aws"
}
to
token = data.aws_eks_cluster_auth.cluster.token
Note that I am using hashicorp/terraform:1.0.11 image on Gitlab runner to execute Terraform Code. Hence manually installing kubectl or aws CLI is not applicable in my case.

This looks like a Helm v3.9 version issue.
Check which version you are using, if so, just do the downgrade to v3.8.
Don't forget to confirm that you are also using the version of kubectl v1.21 and aws-cli v2.7

Related

Kubernetes cluster unreachable when the vm_size was changed in azurerm_kubernetes_cluster

Terraform version
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "=3.16.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "=2.11.0"
}
helm = {
source = "hashicorp/helm"
version = "=2.6.0"
}
}
required_version = "=1.2.6"
}
Terraform Code
resource "azurerm_kubernetes_cluster" "my_cluster" {
name = local.cluster_name
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
dns_prefix = local.dns_prefix
node_resource_group = local.resource_group_node_name
kubernetes_version = "1.24.3"
automatic_channel_upgrade = "patch"
sku_tier = var.sku_tier
default_node_pool {
name = "default"
type = "VirtualMachineScaleSets"
vm_size = var.default_pool_vm_size
enable_auto_scaling = true
max_count = var.default_pool_max_count
min_count = var.default_pool_min_count
os_disk_type = "Ephemeral"
os_disk_size_gb = var.default_pool_os_disk_size_gb
}
identity {
type = "SystemAssigned"
}
network_profile {
network_plugin = "kubenet"
}
}
provider "helm" {
kubernetes {
host = azurerm_kubernetes_cluster.my_cluster.kube_admin_config.0.host
client_certificate = base64decode(azurerm_kubernetes_cluster.my_cluster.kube_admin_config.0.client_certificate)
client_key = base64decode(azurerm_kubernetes_cluster.my_cluster.kube_admin_config.0.client_key)
cluster_ca_certificate = base64decode(azurerm_kubernetes_cluster.my_cluster.kube_admin_config.0.cluster_ca_certificate)
}
}
resource "helm_release" "argocd" {
name = "argocd"
repository = "https://argoproj.github.io/argo-helm"
chart = "argo-cd"
version = "4.10.5"
create_namespace = true
namespace = "argocd"
}
Steps to Reproduce
All resources were created successfully when I executed the terraform code at first time creation.
But it was failed on terraform plan when I changed the vm_size in default node pool.
$ terraform plan
Error: Kubernetes cluster unreachable: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
with helm_release.argocd,
│ on argocd.tf line 1, in resource "helm_release" "argocd":
│ 1: resource "helm_release" "argocd" {
Expected Behavior
The cluster should be reachable even the vm_size was changed.
Actual Behavior
Kubernetes cluster is unreachable for other terraform providers (ex: kubernetes, helm)
Test
I removed resource argocd to prevent the above situation, then terraform could plan and apply successfully.
I get the cluster config data from azure portal, the azurePortalFQDN is different between first time creation.
Question
Will The cluster be recreated if I change default node pool config which is commented “Changing this forces a new resource to be created” on terraform documents? Or only the default node pool will be deleted then create new one however the aks cluster doesn't changed?
Why provider helm could connect to the cluster at first time creation, but it was failed to connect when the resource recreation?
Thanks for your reply.

Create Azure AKS with Managed Identity using Terraform gives AutoUpgradePreview not enabled error

I am trying to create an AKS cluster with managed identity using Terraform. This is my code so far, pretty basic and standard from a few documentation and blog posts I found online.
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "2.79.1"
}
}
}
provider "azurerm" {
features {}
use_msi = true
}
resource "azurerm_resource_group" "rg" {
name = "prod_test"
location = "northeurope"
}
resource "azurerm_kubernetes_cluster" "cluster" {
name = "prod_test_cluster"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
dns_prefix = "weak"
default_node_pool {
name = "default"
node_count = "4"
vm_size = "standard_ds3_v2"
}
identity {
type = "SystemAssigned"
}
}
And this is the error message that I can't come around to a solution. Any thoughts on it?
Error: creating Managed Kubernetes Cluster "prod_test_cluster" (Resource Group "prod_test"): containerservice.ManagedClustersClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: Code="BadRequest" Message="Feature Microsoft.ContainerService/AutoUpgradePreview is not enabled. Please see https://aka.ms/aks/previews for how to enable features."
│
│ with azurerm_kubernetes_cluster.cluster,
│ on main.tf line 19, in resource "azurerm_kubernetes_cluster" "cluster":
│ 19: resource "azurerm_kubernetes_cluster" "cluster" {
│
I tested it on my environment and faced the same issue as you can see below:
So, to give a description on the issue the AutoChannelUpgrade went
to public preview on August 2021. And as per the terraform azurerm provider 2.79.0 , it bydefault passes that value to none in the
backend but as we have not registered for the feature it fails giving
the error Feature Microsoft.ContainerService/AutoUpgradePreview is not enabled.
To confirm you don't have the feature registered you can use the
below command :
az feature show -n AutoUpgradePreview --namespace Microsoft.ContainerService
You will see it not registered as below:
Now to overcome this you can try two solutions as given below:
You can try using terraform azurerm provider 2.78.0 instead of 2.79.1.
Other solution will be to register for the feature and then you can
use the same code that you are using .
You can follow the below steps:
You can use below command to register the feature (it will take around 5
mins to get registered) :
az login --identity
az feature register --namespace Microsoft.ContainerService -n AutoUpgradePreview
After the above is done you can check the registration stauts with below command :
az feature registration show --provider-namespace Microsoft.ContainerService -n AutoUpgradePreview
After the feature status becomes registered you can do a terraform apply to your code .
I tested it using the below code on my VM:
provider "azurerm" {
features {}
subscription_id = "948d4068-xxxxx-xxxxxx-xxxx-e00a844e059b"
tenant_id = "72f988bf-xxxxx-xxxxxx-xxxxx-2d7cd011db47"
use_msi = true
}
resource "azurerm_resource_group" "rg" {
name = "terraformtestansuman"
location = "west us 2"
}
resource "azurerm_kubernetes_cluster" "cluster" {
name = "prod_test_cluster"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
dns_prefix = "weak"
default_node_pool {
name = "default"
node_count = "4"
vm_size = "standard_ds3_v2"
}
identity {
type = "SystemAssigned"
}
}
Outputs:
Reference:
Github Issue
Install Azure CLI if not installed on the VM using Microsoft Installer

terraform plan recreates resources on every run with terraform cloud backend

I am running into an issue where terraform plan recreates resources that don't need to be recreated every run. This is an issue because some of the steps depend on those resources being available, and since they are recreated with each run, the script fails to complete.
My setup is Github Actions, Linode LKE, Terraform Cloud.
My main.tf file looks like this:
terraform {
required_providers {
linode = {
source = "linode/linode"
version = "=1.16.0"
}
helm = {
source = "hashicorp/helm"
version = "=2.1.0"
}
}
backend "remote" {
hostname = "app.terraform.io"
organization = "MY-ORG-HERE"
workspaces {
name = "MY-WORKSPACE-HERE"
}
}
}
provider "linode" {
}
provider "helm" {
debug = true
kubernetes {
config_path = "${local_file.kubeconfig.filename}"
}
}
resource "linode_lke_cluster" "lke_cluster" {
label = "MY-LABEL-HERE"
k8s_version = "1.21"
region = "us-central"
pool {
type = "g6-standard-2"
count = 3
}
}
and my outputs.tf file
resource "local_file" "kubeconfig" {
depends_on = [linode_lke_cluster.lke_cluster]
filename = "kube-config"
# filename = "${path.cwd}/kubeconfig"
content = base64decode(linode_lke_cluster.lke_cluster.kubeconfig)
}
resource "helm_release" "ingress-nginx" {
# depends_on = [local_file.kubeconfig]
depends_on = [linode_lke_cluster.lke_cluster, local_file.kubeconfig]
name = "ingress"
repository = "https://kubernetes.github.io/ingress-nginx"
chart = "ingress-nginx"
}
resource "null_resource" "custom" {
depends_on = [helm_release.ingress-nginx]
# change trigger to run every time
triggers = {
build_number = "${timestamp()}"
}
# download kubectl
provisioner "local-exec" {
command = "curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl && chmod +x kubectl"
}
# apply changes
provisioner "local-exec" {
command = "./kubectl apply -f ./k8s/ --kubeconfig ${local_file.kubeconfig.filename}"
}
}
In Github Actions, I'm running these steps:
jobs:
init-terraform:
runs-on: ubuntu-latest
defaults:
run:
working-directory: ./terraform
steps:
- name: Checkout code
uses: actions/checkout#v2
with:
ref: 'privatebeta-kubes'
- name: Setup Terraform
uses: hashicorp/setup-terraform#v1
with:
cli_config_credentials_token: ${{ secrets.TERRAFORM_API_TOKEN }}
- name: Terraform Init
run: terraform init
- name: Terraform Format Check
run: terraform fmt -check -v
- name: List terraform state
run: terraform state list
- name: Terraform Plan
run: terraform plan
id: plan
env:
LINODE_TOKEN: ${{ secrets.LINODE_TOKEN }}
When I look at the results of terraform state list I can see my resources:
Run terraform state list
terraform state list
shell: /usr/bin/bash -e {0}
env:
TERRAFORM_CLI_PATH: /home/runner/work/_temp/3f9749b8-515b-4cb4-8053-1a6318496321
/home/runner/work/_temp/3f9749b8-515b-4cb4-8053-1a6318496321/terraform-bin state list
helm_release.ingress-nginx
linode_lke_cluster.lke_cluster
local_file.kubeconfig
null_resource.custom
But my terraform plan fails and the issue seems to stem from the fact that those resources try to get recreated.
Run terraform plan
terraform plan
shell: /usr/bin/bash -e {0}
env:
TERRAFORM_CLI_PATH: /home/runner/work/_temp/3f9749b8-515b-4cb4-8053-1a6318496321
LINODE_TOKEN: ***
/home/runner/work/_temp/3f9749b8-515b-4cb4-8053-1a6318496321/terraform-bin plan
Running plan in the remote backend. Output will stream here. Pressing Ctrl-C
will stop streaming the logs, but will not stop the plan running remotely.
Preparing the remote plan...
Waiting for the plan to start...
Terraform v1.0.2
on linux_amd64
Configuring remote state backend...
Initializing Terraform configuration...
linode_lke_cluster.lke_cluster: Refreshing state... [id=31946]
local_file.kubeconfig: Refreshing state... [id=fbb5520298c7c824a8069397ef179e1bc971adde]
helm_release.ingress-nginx: Refreshing state... [id=ingress]
╷
│ Error: Kubernetes cluster unreachable: stat kube-config: no such file or directory
│
│ with helm_release.ingress-nginx,
│ on outputs.tf line 8, in resource "helm_release" "ingress-nginx":
│ 8: resource "helm_release" "ingress-nginx" {
Is there a way to tell terraform it doesn't need to recreate those resources?
Regarding the actual error shown, Error: Kubernetes cluster unreachable: stat kibe-config: no such file or directory... which is referencing your outputs file... I found this which could help with your specific error: https://github.com/hashicorp/terraform-provider-helm/issues/418
1 other thing looks strange to me. Why does your outputs.tf refer to 'resources' & not 'outputs'. Shouldn't your outputs.tf look like this?
output "local_file_kubeconfig" {
value = "reference.to.resource"
}
Also I see your state file / backend config looks like it's properly configured.
I recommend logging into your terraform cloud account to verify that the workspace is indeed there, as expected. It's the state file that tells terraform not to re-create the resources it manages.
If the resources are already there and terraform is trying to re-create them, that could indicate that those resources were created prior to using terraform or possibly within another terraform cloud workspace or plan.
Did you end up renaming your backend workspace at any point with this plan? I'm referring to your main.tf file, this part where it says MY-WORKSPACE-HERE :
terraform {
required_providers {
linode = {
source = "linode/linode"
version = "=1.16.0"
}
helm = {
source = "hashicorp/helm"
version = "=2.1.0"
}
}
backend "remote" {
hostname = "app.terraform.io"
organization = "MY-ORG-HERE"
workspaces {
name = "MY-WORKSPACE-HERE"
}
}
}
Unfortunately I am not a kurbenetes expert, so possibly more help can be used there.

How to install AGIC in Kubernetes cluster using Terraform

I am trying to install AGIC in AKS using Terraform. I am following this document https://learn.microsoft.com/en-us/azure/terraform/terraform-create-k8s-cluster-with-aks-applicationgateway-ingress but this document shows partial terraform deployment i want to fully automate it with the help of Terraform. Is there any other document/way to do this?
Of course, you can use the Terraform to deploy the Helm charts to the AKS. And here is an example for deploying Helm charts through Terraform:
data "helm_repository" "stable" {
name = "stable"
url = "https://kubernetes-charts.storage.googleapis.com"
}
resource "helm_release" "example" {
name = "my-redis-release"
repository = data.helm_repository.stable.metadata[0].name
chart = "redis"
version = "6.0.1"
values = [
"${file("values.yaml")}"
]
set {
name = "cluster.enabled"
value = "true"
}
set {
name = "metrics.enabled"
value = "true"
}
set_string {
name = "service.annotations.prometheus\\.io/port"
value = "9127"
}
}
And you can also configure the certificate of the AKS to deploy the Helm charts through Terraform, take a look at the document here.

Deploying helm charts via Terraform Helm provider and Azure DevOps while fetching the helm charts from ACR

I am trying to deploy the helm charts from ACR to an AKS cluster using Terraform helm provider and Azure DevOps container job but it fails while fetching the helm chart from ACR. Please let me know what is going wrong.
helm provider tf module:
data "helm_repository" "cluster_rbac_helm_chart_repo" {
name = "mcp-rbac-cluster"
url = "https://mcpshareddcr.azurecr.io"
}
# Deploy Cluster RBAC helm chart onto the cluster
resource "helm_release" "cluster_rbac_helm_chart_release" {
name = "mcp-rbac-cluster"
repository = data.helm_repository.cluster_rbac_helm_chart_repo.metadata[0].name
chart = "mcp-rbac-cluster"
}
provider:
version = "=1.36.0"
tenant_id = var.ARM_TENANT_ID
subscription_id = var.ARM_SUBSCRIPTION_ID
client_id = var.ARM_CLIENT_ID
client_secret = var.ARM_CLIENT_SECRET
skip_provider_registration = true
}
data "azurerm_kubernetes_cluster" "aks_cluster" {
name = var.aks_cluster
resource_group_name = var.resource_group_aks
}
locals {
kubeconfig_path = "/tmp/kubeconfig"
}
resource "local_file" "kubeconfig" {
filename = local.kubeconfig_path
content = data.azurerm_kubernetes_cluster.aks_cluster.kube_admin_config_raw
}
provider "helm" {
home = "resources/.helm"
kubernetes {
load_config_file = true
config_path = local.kubeconfig_path
}
}
module "aks_resources" {
source = "./modules/helm/aks-resources"
}
error:
Error: Looks like "" is not a valid chart repository or cannot be reached: Failed to fetch /index.yaml : 404 Not Found
Until now, Helm still doesn't support directly installing chart from an OCI registry.
The recommended steps are:
helm chart remove mycontainerregistry.azurecr.io/helm/hello-world:v1
helm chart pull mycontainerregistry.azurecr.io/helm/hello-world:v1
helm chart export mycontainerregistry.azurecr.io/helm/hello-world:v1 --destination ./install
cd install & helm install myhelmtest ./hello-world
So my solution is:
resource "null_resource" "download_chart" {
provisioner "local-exec" {
command = <<-EOT
export HELM_EXPERIMENTAL_OCI=1
helm registry login mycontainerregistry.azurecr.io --username someuser --password somepass
helm chart remove mycontainerregistry.azurecr.io/helm/hello-world:v1
helm chart pull mycontainerregistry.azurecr.io/helm/hello-world:v1
helm chart export mycontainerregistry.azurecr.io/helm/hello-world:v1 --destination ./install
EOT
}
}
resource "helm_release" "chart" {
name = "hello_world"
repository = "./install"
chart = "hello-world"
version = "v1"
depends_on = [null_resource.download_chart]
}
Not perfect but works.
The problem is that you use the wrong url in the Terraform helm_repository. The right url for ACR looks like this:
https://acrName.azurecr.io/helm/v1/repo
And the ACR is a private registry, so it means you need to add the username and password for it. Finally, your Terraform code should like this with version 2.0+ of helm provider:
resource "helm_release" "my-chart" {
name = "my-chart"
chart = "my/chart"
repository = "https://${var.acr_name}.azurecr.io/helm/v1/repo"
repository_username = var.acr_user_name
repository_password = var.acr_user_password
}
Or with 1.x helm provider:
data "helm_repository" "cluster_rbac_helm_chart_repo" {
name = "mcp-rbac-cluster"
url = "https://mcpshareddcr.azurecr.io/helm/v1/repo"
username = "xxxxx"
password = "xxxxx"
}
# Deploy Cluster RBAC helm chart onto the cluster
resource "helm_release" "cluster_rbac_helm_chart_release" {
name = "mcp-rbac-cluster"
repository = data.helm_repository.cluster_rbac_helm_chart_repo.metadata[0].name
chart = "mcp-rbac-cluster"
}
Update
Here is the screenshot that it works well and deploy the charts in the AKS:
Small enhancement to the above solution. Include a trigger to force a download of the chart every time. Otherwise, it expects that you always maintain the local copy of the chart post the first deployment
resource "null_resource" "download_chart" {
triggers = {
always_run = timestamp()
}
provisioner "local-exec" {
command = <<-EOT
export HELM_EXPERIMENTAL_OCI=1
helm registry login ${var.registry_fqdn} --username ${var.acr_client_id} --password ${var.acr_client_secret}
helm chart remove ${var.registry_fqdn}/helm/${var.chart_name}:${var.chart_tag}
helm chart pull ${var.registry_fqdn}/helm/${var.chart_name}:${var.chart_tag}
helm chart export ${var.registry_fqdn}/helm/${var.chart_name}:${var.chart_tag} --destination ./install
EOT
}
}