Terraform Unable to find Helm Release charts - kubernetes

I'm running Kubernetes on GCP and doing changes via Terraform v0.11.14
When running terraform plan I'm getting the error messages here
Error: Error refreshing state: 2 errors occurred:
* module.cls-xxx-us-central1-a-dev.helm_release.cert-manager: 1 error occurred:
* module.cls-xxx-us-central1-a-dev.helm_release.cert-manager: helm_release.cert-manager: error installing: the server could not find the requested resource
* module.cls-xxx-us-central1-a-dev.helm_release.nginx: 1 error occurred:
* module.cls-xxx-us-central1-a-dev.helm_release.nginx: helm_release.nginx: error installing: the server could not find the requested resource
Here's a copy of my helm.tf
resource "helm_release" "nginx" {
depends_on = ["google_container_node_pool.tally-np"]
name = "ingress-nginx"
chart = "ingress-nginx/ingress-nginx"
namespace = "kube-system"
}
resource "helm_release" "cert-manager" {
depends_on = ["google_container_node_pool.tally-np"]
name = "cert-manager"
chart = "stable/cert-manager"
namespace = "kube-system"
set {
name = "ingressShim.defaultIssuerName"
value = "letsencrypt-production"
}
set {
name = "ingressShim.defaultIssuerKind"
value = "ClusterIssuer"
}
provisioner "local-exec" {
command = "gcloud container clusters get-credentials ${var.cluster_name} --zone ${google_container_cluster.cluster.zone} && kubectl create -f ${path.module}/letsencrypt-prod.yaml"
}
}
I've read that Helm deprecated most of the old chart repos so I tried adding the repositories and installing the charts locally under the namespace kube-system but so far the issue is still persisting.
Here's the list of versions for Terraform and it's providers
Terraform v0.11.14
provider.google v2.17.0
provider.helm v0.10.2
provider.kubernetes v1.9.0
provider.random v2.2.1

As the community is moving towards Helm v3, the maintainers have depreciated the old helm model where we had a single mono repo called stable. The new model is like each product having its own repo. On November 13, 2020 the stable and incubator charts repository reached the end of development and became archives.
The archived charts are now hosted at a new URL. To continue using the archived charts, you will have to make some tweaks in your helm workflow.
Sample workaround:
helm repo add new-stable https://charts.helm.sh/stable
helm fetch new-stable/prometheus-operator

Related

Helm reads wrong Kubeversion: >=1.22.0-0 for v1.23.0 as v1.20.0

How to deploy on K8 via Pulumi using the ArgoCD Helm Chart?
Pulumi up Diagnostics:
kubernetes:helm.sh/v3:Release (argocd):
error: failed to create chart from template: chart requires kubeVersion: >=1.22.0-0 which is incompatible with Kubernetes v1.20.0
THE CLUSTER VERSION IS: v1.23.0 verified on AWS. And NOT 1.20.0
ArgoCD install yaml used with CRD2Pulumi: https://raw.githubusercontent.com/argoproj/argo-cd/master/manifests/core-install.yaml
Source:
...
cluster = eks.Cluster("argo-example") # version="1.23"
# Cluster provider
provider = k8s.Provider(
"eks",
kubeconfig=cluster.kubeconfig.apply(lambda k: json.dumps(k))
#kubeconfig=cluster.kubeconfig
)
ns = k8s.core.v1.Namespace(
'argocd',
metadata={
"name": "argocd",
},
opts=pulumi.ResourceOptions(
provider=provider
)
)
argo = k8s.helm.v3.Release(
"argocd",
args=k8s.helm.v3.ReleaseArgs(
chart="argo-cd",
namespace=ns.metadata.name,
repository_opts=k8s.helm.v3.RepositoryOptsArgs(
repo="https://argoproj.github.io/argo-helm"
),
values={
"server": {
"service": {
"type": "LoadBalancer",
}
}
},
),
opts=pulumi.ResourceOptions(provider=provider, parent=ns),
)
Any ideas as to fixing this oddity between the version error and the actual cluster version?
I've tried:
Deleting everything and starting over.
Updating to the latest ArgoCD install yaml.
I could reproduce your issue, though I am not quite sure what causes the mismatch between versions. Better open an issue at pulumi's k8s repository.
Looking at the history of https://github.com/argoproj/argo-helm/blame/main/charts/argo-cd/Chart.yaml, you can see that the kubeversion requirement has been added after 5.9.1. So using that version successfully deploys the helm chart. E.g.
import * as k8s from "#pulumi/kubernetes";
const namespaceName = "argo";
const namespace = new k8s.core.v1.Namespace("namespace", {
metadata: {
name: namespaceName,
}
});
const argo = new k8s.helm.v3.Release("argo", {
repositoryOpts: {
repo: "https://argoproj.github.io/argo-helm"
},
chart: "argo-cd",
version: "5.9.1",
namespace: namespace.metadata.name,
})
(Not Recommended) Alternatively, you could also clone the source code of the chart, comment out the kubeVersion requirement in Chart.yaml and install the chart from your local path.
Upgrade helm. I had a similar issue where my k8s was 1.25 but helm complained it was 1.20. Tried everything else, upgrading helm worked.

Terraform running in Azure Pipeline attempting to install azcli provider

I'm running Terraform in an Azure Pipeline (something I have experience of doing) and for some reason the init step is attempting to install a provider for azcli, which I don't think exists. This does not happen when I run Terraform on my local machine.
My providers file is:
terraform {
required_version = ">=0.13"
backend "azurerm" {
container_name = "tfstate"
key = "terraform.tfstate"
}
required_providers {
grafana = {
source = "grafana/grafana"
version = "=1.5.0"
}
}
}
This is the error I'm seeing:
I'm not sure why Terraform is trying to install the azcli provider; I don't think it even exists. Has anyone seen this before?
Terraform searches directly and indirectly for providers when initialization. It is possible there is a mistake in the resource name or provider definition. Search your codebase for azcli.
▶ cat .\main.tf
resource "azcli_test" "test" {
test = "true"
}
~\projects\test\t5 ◷ 10:10:21 AM
▶ C:\Users\pearcec\bin\terraform init
Initializing the backend...
Initializing provider plugins...
- Finding latest version of hashicorp/azcli...
Error: Failed to install provider
Error while installing hashicorp/azcli: provider registry
registry.terraform.io does not have a provider named
registry.terraform.io/hashicorp/azcli
or
~\projects\test\t5 ◷ 10:10:23 AM
▶ cat .\main.tf
provider "azcli" {
features {}
}
~\projects\test\t5 ◷ 10:13:41 AM
▶ C:\Users\pearcec\bin\terraform init
Initializing the backend...
Initializing provider plugins...
- Finding latest version of hashicorp/azcli...
Error: Failed to install provider
Error while installing hashicorp/azcli: provider registry
registry.terraform.io does not have a provider named
registry.terraform.io/hashicorp/azcli
or
▶ cat .\main.tf
terraform {
required_providers {
azcli = {
source = "-/azcli"
}
}
}
~\projects\test\t5 ◷ 10:16:09 AM
▶ C:\Users\pearcec\bin\terraform init
Initializing the backend...
Initializing provider plugins...
- Finding latest version of -/azcli...
Error: Failed to query available provider packages
Could not retrieve the list of available versions for provider -/azcli:
provider registry registry.terraform.io does not have a provider named
registry.terraform.io/-/azcli

Create GKE cluster and namespace with Terraform

I need to create GKE cluster and then create namespace and install db through helm to that namespace. Now I have gke-cluster.tf that creates cluster with node pool and helm.tf, that has kubernetes provider and helm_release resource. It first creates cluster, but then tries to install db but namespace doesn't exist yet, so I have to run terraform apply again and it works. I want to avoid scenario with multiple folder and run terraform apply only once. What's the good practice for situaction like this? Thanks for the answers.
The create_namespace argument of helm_release resource can help you.
create_namespace - (Optional) Create the namespace if it does not yet exist. Defaults to false.
https://registry.terraform.io/providers/hashicorp/helm/latest/docs/resources/release#create_namespace
Alternatively, you can define a dependency between the namespace resource and helm_release like below:
resource "kubernetes_namespace" "prod" {
metadata {
annotations = {
name = "prod-namespace"
}
labels = {
namespace = "prod"
}
name = "prod"
}
}
resource "helm_release" "arango-crd" {
name = "arango-crd"
chart = "./kube-arangodb-crd"
namespace = "prod"
depends_on = [ kubernetes_namespace.prod ]
}
The solution posted by user adp is correct but I wanted to give more insight on using Terraform for this particular example in regards of running single commmand:
$ terraform apply --auto-approve.
Basing on following comments:
Can you tell how are you creating your namespace? Is it with kubernetes provider? - Dawid Kruk
resource "kubernetes_namespace" - Jozef Vrana
This setup needs specific order of execution. First the cluster, then the resources. By default Terraform will try to create all of the resources at the same time. It is crucial to use a parameter depends_on = [VALUE].
The next issue is that the kubernetes provider will try to fetch the credentials at the start of the process from ~/.kube/config. It will not wait for the cluster provisioning to get the actual credentials. It could:
fail when there is no .kube/config
fetch credentials for the wrong cluster.
There is ongoing feature request to resolve this kind of use case (also there are some workarounds):
Github.com: Hashicorp: Terraform: Issue: depends_on for providers
As an example:
# Create cluster
resource "google_container_cluster" "gke-terraform" {
project = "PROJECT_ID"
name = "gke-terraform"
location = var.zone
initial_node_count = 1
}
# Get the credentials
resource "null_resource" "get-credentials" {
depends_on = [google_container_cluster.gke-terraform]
provisioner "local-exec" {
command = "gcloud container clusters get-credentials ${google_container_cluster.gke-terraform.name} --zone=europe-west3-c"
}
}
# Create a namespace
resource "kubernetes_namespace" "awesome-namespace" {
depends_on = [null_resource.get-credentials]
metadata {
name = "awesome-namespace"
}
}
Assuming that you had earlier configured cluster to work on and you didn't delete it:
Credentials for Kubernetes cluster are fetched.
Terraform will create a cluster named gke-terraform
Terraform will run a local command to get the credentials for gke-terraform cluster
Terraform will create a namespace (using old information):
if you had another cluster in .kube/config configured, it will create a namespace in that cluster (previous one)
if you deleted your previous cluster, it will try to create a namespace in that cluster and fail (previous one)
if you had no .kube/config it will fail on the start
Important!
Using "helm_release" resource seems to get the credentials when provisioning the resources, not at the start!
As said you can use helm provider to provision the resources on your cluster to avoid the issues I described above.
Example on running a single command for creating a cluster and provisioning resources on it:
variable zone {
type = string
default = "europe-west3-c"
}
resource "google_container_cluster" "gke-terraform" {
project = "PROJECT_ID"
name = "gke-terraform"
location = var.zone
initial_node_count = 1
}
data "google_container_cluster" "gke-terraform" {
project = "PROJECT_ID"
name = "gke-terraform"
location = var.zone
}
resource "null_resource" "get-credentials" {
# do not start before resource gke-terraform is provisioned
depends_on = [google_container_cluster.gke-terraform]
provisioner "local-exec" {
command = "gcloud container clusters get-credentials ${google_container_cluster.gke-terraform.name} --zone=${var.zone}"
}
}
resource "helm_release" "mydatabase" {
name = "mydatabase"
chart = "stable/mariadb"
# do not start before the get-credentials resource is run
depends_on = [null_resource.get-credentials]
set {
name = "mariadbUser"
value = "foo"
}
set {
name = "mariadbPassword"
value = "qux"
}
}
Using above configuration will yield:
data.google_container_cluster.gke-terraform: Refreshing state...
google_container_cluster.gke-terraform: Creating...
google_container_cluster.gke-terraform: Still creating... [10s elapsed]
<--OMITTED-->
google_container_cluster.gke-terraform: Still creating... [2m30s elapsed]
google_container_cluster.gke-terraform: Creation complete after 2m38s [id=projects/PROJECT_ID/locations/europe-west3-c/clusters/gke-terraform]
null_resource.get-credentials: Creating...
null_resource.get-credentials: Provisioning with 'local-exec'...
null_resource.get-credentials (local-exec): Executing: ["/bin/sh" "-c" "gcloud container clusters get-credentials gke-terraform --zone=europe-west3-c"]
null_resource.get-credentials (local-exec): Fetching cluster endpoint and auth data.
null_resource.get-credentials (local-exec): kubeconfig entry generated for gke-terraform.
null_resource.get-credentials: Creation complete after 1s [id=4191245626158601026]
helm_release.mydatabase: Creating...
helm_release.mydatabase: Still creating... [10s elapsed]
<--OMITTED-->
helm_release.mydatabase: Still creating... [1m40s elapsed]
helm_release.mydatabase: Creation complete after 1m44s [id=mydatabase]

Unable to create windows nodepool on GKE cluster with google terraform GKE module

I am trying to provision a GKE cluster with windows node_pool using google modules, I am calling module
source = "terraform-google-modules/kubernetes-engine/google//modules/beta-private-cluster-update-variant"
version = "9.2.0"
I had to define two pools one for linux pool required by GKE and the windows one we require, terraform always succeeds in provisioning the linux node_pool but fails to provision the windows one and the error message
module.gke.google_container_cluster.primary: Still modifying... [id=projects/uk-xxx-xx-xxx-b821/locations/europe-west2/clusters/gke-nonpci-dev, 24m31s elapsed]
module.gke.google_container_cluster.primary: Still modifying... [id=projects/uk-xxx-xx-xxx-b821/locations/europe-west2/clusters/gke-nonpci-dev, 24m41s elapsed]
module.gke.google_container_cluster.primary: Still modifying... [id=projects/uk-xxx-xx-xxx-b821/locations/europe-west2/clusters/gke-nonpci-dev, 24m51s elapsed]
module.gke.google_container_cluster.primary: Modifications complete after 24m58s [id=projects/xx-xxx-xx-xxx-b821/locations/europe-west2/clusters/gke-nonpci-dev]
module.gke.google_container_node_pool.pools["windows-node-pool"]: Creating...
Error: error creating NodePool: googleapi: Error 400: Workload Identity is not supported on Windows nodes. Create the nodepool without workload identity by specifying --workload-metadata=GCE_METADATA., badRequest
on .terraform\modules\gke\terraform-google-kubernetes-engine-9.2.0\modules\beta-private-cluster-update-variant\cluster.tf line 341, in resource "google_container_node_pool" "pools":
341: resource "google_container_node_pool" "pools" {
I tried many places to set this metadata values but I coldn't get it right:
From terraform side :
I tried many places to add this metadata inside the node_config scope in the module itself or in my main.tf file where I call the module I tried to add it to the windows node_pool scope of the node_pools list but it didn't accept it with a message that setting WORKLOAD IDENTITY isn't expected here
I tried also setting enable_shielded_nodes = false but this didn't really help much.
I tried to test this if it is doable even through the command line this was my command line
C:\>gcloud container node-pools --region europe-west2 list
NAME MACHINE_TYPE DISK_SIZE_GB NODE_VERSION
default-node-pool-d916 n1-standard-2 100 1.17.9-gke.600
C:\>gcloud container node-pools --region europe-west2 create window-node-pool --cluster=gke-nonpci-dev --image-type=WINDOWS_SAC --no-enable-autoupgrade --machine-type=n1-standard-2
WARNING: Starting in 1.12, new node pools will be created with their legacy Compute Engine instance metadata APIs disabled by default. To create a node pool with legacy instance metadata endpoints disabled, run `node-pools create` with the flag `--metadata disable-legacy-endpoints=true`.
This will disable the autorepair feature for nodes. Please see https://cloud.google.com/kubernetes-engine/docs/node-auto-repair for more information on node autorepairs.
ERROR: (gcloud.container.node-pools.create) ResponseError: code=400, message=Workload Identity is not supported on Windows nodes. Create the nodepool without workload identity by specifying --workload-metadata=GCE_METADATA.
C:\>gcloud container node-pools --region europe-west2 create window-node-pool --cluster=gke-nonpci-dev --image-type=WINDOWS_SAC --no-enable-autoupgrade --machine-type=n1-standard-2 --workload-metadata=GCE_METADATA --metadata disable-legacy-endpoints=true
This will disable the autorepair feature for nodes. Please see https://cloud.google.com/kubernetes-engine/docs/node-auto-repair for more information on node autorepairs.
ERROR: (gcloud.container.node-pools.create) ResponseError: code=400, message=Service account "874988475980-compute#developer.gserviceaccount.com" does not exist.
C:\>gcloud auth list
Credentialed Accounts
ACTIVE ACCOUNT
* tf-xxx-xxx-xx-xxx#xx-xxx-xx-xxx-xxxx.iam.gserviceaccount.com
This service account from running gcloud auth list is the one I am running terraform with but I don't know where is this one in the error message coming from, even though trying to create the windows nodepool through command line as shown above also didn't work I am a bit stuck and I don't know what to do.
As module 9.2.0 is a stable module for us through all our linux based clusters we setup before, hence I thought this may be an old version for a windows node_pool I used the 11.0.0 instead to see if this would make it any different but ended up with a different error
module.gke.google_container_node_pool.pools["default-node-pool"]: Refreshing state... [id=projects/uk-tix-p1-npe-b821/locations/europe-west2/clusters/gke-nonpci-dev/nodePools/default-node-pool-d916]
Error: failed to execute ".terraform/modules/gke.gcloud_delete_default_kube_dns_configmap/terraform-google-gcloud-1.4.1/scripts/check_env.sh": fork/exec .terraform/modules/gke.gcloud_delete_default_kube_dns_configmap/terraform-google-gcloud-1.4.1/scripts/check_env.sh: %1 is not a valid Win32 application.
on .terraform\modules\gke.gcloud_delete_default_kube_dns_configmap\terraform-google-gcloud-1.4.1\main.tf line 70, in data "external" "env_override":
70: data "external" "env_override" {
Error: failed to execute ".terraform/modules/gke.gcloud_wait_for_cluster/terraform-google-gcloud-1.3.0/scripts/check_env.sh": fork/exec .terraform/modules/gke.gcloud_wait_for_cluster/terraform-google-gcloud-1.3.0/scripts/check_env.sh: %1 is not a valid Win32 application.
on .terraform\modules\gke.gcloud_wait_for_cluster\terraform-google-gcloud-1.3.0\main.tf line 70, in data "external" "env_override":
70: data "external" "env_override" {
This how I set node_pools parameters
node_pools = [
{
name = "linux-node-pool"
machine_type = var.nodepool_instance_type
min_count = 1
max_count = 10
disk_size_gb = 100
disk_type = "pd-standard"
image_type = "COS"
auto_repair = true
auto_upgrade = true
service_account = google_service_account.gke_cluster_sa.email
preemptible = var.preemptible
initial_node_count = 1
},
{
name = "windows-node-pool"
machine_type = var.nodepool_instance_type
min_count = 1
max_count = 10
disk_size_gb = 100
disk_type = "pd-standard"
image_type = var.nodepool_image_type
auto_repair = true
auto_upgrade = true
service_account = google_service_account.gke_cluster_sa.email
preemptible = var.preemptible
initial_node_count = 1
}
]
cluster_resource_labels = var.cluster_resource_labels
# health check and webhook firewall rules
node_pools_tags = {
all = [
"xx-xxx-xxx-local-xxx",
]
}
node_pools_metadata = {
all = {
// workload-metadata = "GCE_METADATA"
}
linux-node-pool = {
ssh-keys = join("\n", [for user, key in var.node_ssh_keys : "${user}:${key}"])
block-project-ssh-keys = true
}
windows-node-pool = {
workload-metadata = "GCE_METADATA"
}
}
this is a shared VPC where I provision my cluster with cluster version: 1.17.9-gke.600
Checkout https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/issues/632 for the solution.
Error message is ambiguous and GKE has an internal bug to track this issue. We will improve the error message soon.

Invalid Slug version in terraform

I am trying to create a kubernetes cluster with terraform but it shows me an error, I changed the value of the version on different occasions but it did not work.
resource "digitalocean_kubernetes_cluster" "lox" {
name = "lox"
region = "nyc1"
version = "1.13.4-do.0"
node_pool {
name = "worker-pool"
size = "s-1vcpu-2gb"
node_count = 2
}
This is the error:
Error: Error creating Kubernetes cluster: POST https://api.digitalocean.com/v2/kubernetes/clusters: 422 validation error: invalid version slug
on 01-cluster.tf line 1, in resource "digitalocean_kubernetes_cluster" "lox":
1: resource "digitalocean_kubernetes_cluster" "lox" {
how can i solve it?
Use below command to grab the latest and valid version slug and use it in version
doctl kubernetes options versions
The version you're setting does not exist.
Check here: https://www.digitalocean.com/docs/kubernetes/changelog/ for all the versions available, or using the doctl command line.
If you're targeting 1.13, you may use 1.13.12-do.8 as the version, released on 22/06/2020.
i wasn't able to find the version in changelog, found it here https://slugs.do-api.dev/ (tab "Kubernetes versions")
doctl kubernetes options versions
Slug Kubernetes Version Supported Features
1.24.4-do.0 1.24.4 cluster-autoscaler, docr-integration, ha-control-plane, token-authentication
1.23.10-do.0 1.23.10 cluster-autoscaler, docr-integration, ha-control-plane, token-authentication
1.22.13-do.0 1.22.13 cluster-autoscaler, docr-integration, ha-control-plane, token-authentication