Helm - Kubernetes cluster unreachable: the server has asked for the client to provide credentials - kubernetes

I'm trying to deploy an EKS self managed with Terraform. While I can deploy the cluster with addons, vpc, subnet and all other resources, it always fails at helm:
Error: Kubernetes cluster unreachable: the server has asked for the client to provide credentials
with module.eks-ssp-kubernetes-addons.module.ingress_nginx[0].helm_release.nginx[0]
on .terraform/modules/eks-ssp-kubernetes-addons/modules/kubernetes-addons/ingress-nginx/main.tf line 19, in resource "helm_release" "nginx":
resource "helm_release" "nginx" {
This error repeats for metrics_server, lb_ingress, argocd, but cluster-autoscaler throws:
Warning: Helm release "cluster-autoscaler" was created but has a failed status.
with module.eks-ssp-kubernetes-addons.module.cluster_autoscaler[0].helm_release.cluster_autoscaler[0]
on .terraform/modules/eks-ssp-kubernetes-addons/modules/kubernetes-addons/cluster-autoscaler/main.tf line 1, in resource "helm_release" "cluster_autoscaler":
resource "helm_release" "cluster_autoscaler" {
My main.tf looks like this:
terraform {
backend "remote" {}
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 3.66.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = ">= 2.7.1"
}
helm = {
source = "hashicorp/helm"
version = ">= 2.4.1"
}
}
}
data "aws_eks_cluster" "cluster" {
name = module.eks-ssp.eks_cluster_id
}
data "aws_eks_cluster_auth" "cluster" {
name = module.eks-ssp.eks_cluster_id
}
provider "aws" {
access_key = "xxx"
secret_key = "xxx"
region = "xxx"
assume_role {
role_arn = "xxx"
}
}
provider "kubernetes" {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
token = data.aws_eks_cluster_auth.cluster.token
}
provider "helm" {
kubernetes {
host = data.aws_eks_cluster.cluster.endpoint
token = data.aws_eks_cluster_auth.cluster.token
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
}
}
My eks.tf looks like this:
module "eks-ssp" {
source = "github.com/aws-samples/aws-eks-accelerator-for-terraform"
# EKS CLUSTER
tenant = "DevOpsLabs2b"
environment = "dev-test"
zone = ""
terraform_version = "Terraform v1.1.4"
# EKS Cluster VPC and Subnet mandatory config
vpc_id = "xxx"
private_subnet_ids = ["xxx","xxx", "xxx", "xxx"]
# EKS CONTROL PLANE VARIABLES
create_eks = true
kubernetes_version = "1.19"
# EKS SELF MANAGED NODE GROUPS
self_managed_node_groups = {
self_mg = {
node_group_name = "DevOpsLabs2b"
subnet_ids = ["xxx","xxx", "xxx", "xxx"]
create_launch_template = true
launch_template_os = "bottlerocket" # amazonlinux2eks or bottlerocket or windows
custom_ami_id = "xxx"
public_ip = true # Enable only for public subnets
pre_userdata = <<-EOT
yum install -y amazon-ssm-agent \
systemctl enable amazon-ssm-agent && systemctl start amazon-ssm-agent \
EOT
disk_size = 10
instance_type = "t2.small"
desired_size = 2
max_size = 10
min_size = 0
capacity_type = "" # Optional Use this only for SPOT capacity as capacity_type = "spot"
k8s_labels = {
Environment = "dev-test"
Zone = ""
WorkerType = "SELF_MANAGED_ON_DEMAND"
}
additional_tags = {
ExtraTag = "t2x-on-demand"
Name = "t2x-on-demand"
subnet_type = "public"
}
create_worker_security_group = false # Creates a dedicated sec group for this Node Group
},
}
}
enable_amazon_eks_vpc_cni = true
amazon_eks_vpc_cni_config = {
addon_name = "vpc-cni"
addon_version = "v1.7.5-eksbuild.2"
service_account = "aws-node"
resolve_conflicts = "OVERWRITE"
namespace = "kube-system"
additional_iam_policies = []
service_account_role_arn = ""
tags = {}
}
enable_amazon_eks_kube_proxy = true
amazon_eks_kube_proxy_config = {
addon_name = "kube-proxy"
addon_version = "v1.19.8-eksbuild.1"
service_account = "kube-proxy"
resolve_conflicts = "OVERWRITE"
namespace = "kube-system"
additional_iam_policies = []
service_account_role_arn = ""
tags = {}
}
#K8s Add-ons
enable_aws_load_balancer_controller = true
enable_metrics_server = true
enable_cluster_autoscaler = true
enable_aws_for_fluentbit = true
enable_argocd = true
enable_ingress_nginx = true
depends_on = [module.eks-ssp.self_managed_node_groups]
}

OP has confirmed in the comment that the problem was resolved:
Of course. I think I found the issue. Doing "kubectl get svc" throws: "An error occurred (AccessDenied) when calling the AssumeRole operation: User: arn:aws:iam::xxx:user/terraform_deploy is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::xxx:user/terraform_deploy"
Solved it by using my actual role, that's crazy. No idea why it was calling itself.
For similar problem look also this issue.

I solved this error by adding dependencies in the helm installations.
The depends_on will wait for the step to successfully complete and then helm module runs.
module "nginx-ingress" {
depends_on = [module.eks, module.aws-load-balancer-controller]
source = "terraform-module/release/helm"
...}
module "aws-load-balancer-controller" {
depends_on = [module.eks]
source = "terraform-module/release/helm"
...}
module "helm_autoscaler" {
depends_on = [module.eks]
source = "terraform-module/release/helm"
...}

Related

Kubernetes cluster unreachable when the vm_size was changed in azurerm_kubernetes_cluster

Terraform version
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "=3.16.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "=2.11.0"
}
helm = {
source = "hashicorp/helm"
version = "=2.6.0"
}
}
required_version = "=1.2.6"
}
Terraform Code
resource "azurerm_kubernetes_cluster" "my_cluster" {
name = local.cluster_name
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
dns_prefix = local.dns_prefix
node_resource_group = local.resource_group_node_name
kubernetes_version = "1.24.3"
automatic_channel_upgrade = "patch"
sku_tier = var.sku_tier
default_node_pool {
name = "default"
type = "VirtualMachineScaleSets"
vm_size = var.default_pool_vm_size
enable_auto_scaling = true
max_count = var.default_pool_max_count
min_count = var.default_pool_min_count
os_disk_type = "Ephemeral"
os_disk_size_gb = var.default_pool_os_disk_size_gb
}
identity {
type = "SystemAssigned"
}
network_profile {
network_plugin = "kubenet"
}
}
provider "helm" {
kubernetes {
host = azurerm_kubernetes_cluster.my_cluster.kube_admin_config.0.host
client_certificate = base64decode(azurerm_kubernetes_cluster.my_cluster.kube_admin_config.0.client_certificate)
client_key = base64decode(azurerm_kubernetes_cluster.my_cluster.kube_admin_config.0.client_key)
cluster_ca_certificate = base64decode(azurerm_kubernetes_cluster.my_cluster.kube_admin_config.0.cluster_ca_certificate)
}
}
resource "helm_release" "argocd" {
name = "argocd"
repository = "https://argoproj.github.io/argo-helm"
chart = "argo-cd"
version = "4.10.5"
create_namespace = true
namespace = "argocd"
}
Steps to Reproduce
All resources were created successfully when I executed the terraform code at first time creation.
But it was failed on terraform plan when I changed the vm_size in default node pool.
$ terraform plan
Error: Kubernetes cluster unreachable: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
with helm_release.argocd,
│ on argocd.tf line 1, in resource "helm_release" "argocd":
│ 1: resource "helm_release" "argocd" {
Expected Behavior
The cluster should be reachable even the vm_size was changed.
Actual Behavior
Kubernetes cluster is unreachable for other terraform providers (ex: kubernetes, helm)
Test
I removed resource argocd to prevent the above situation, then terraform could plan and apply successfully.
I get the cluster config data from azure portal, the azurePortalFQDN is different between first time creation.
Question
Will The cluster be recreated if I change default node pool config which is commented “Changing this forces a new resource to be created” on terraform documents? Or only the default node pool will be deleted then create new one however the aks cluster doesn't changed?
Why provider helm could connect to the cluster at first time creation, but it was failed to connect when the resource recreation?
Thanks for your reply.

Automated .kube/config file created with Terraform GKE module doesn't work

We are using the below terraform module to create the GKE cluster and the local config file. But the kubectl command doesn't work. The terminal just keeps on waiting. Then we have to manually run the command gcloud container clusters get-credentials to fetch and set up the local config credentials post the cluster creation, and then the kubectl command works. It doesn't feel elegant to execute the gcloud command after the terraform setup. Therefore please help to correct what's wrong.
terraform {
required_providers {
google = {
source = "hashicorp/google"
version = ">= 3.42.0"
}
google-beta = {
source = "hashicorp/google-beta"
version = ">= 3.0.0"
}
}
}
provider "google" {
credentials = var.gcp_creds
project = var.project_id
region = var.region
zone = var.zone
}
provider "google-beta" {
credentials = var.gcp_creds
project = var.project_id
region = var.region
zone = var.zone
}
module "gke" {
source = "terraform-google-modules/kubernetes-engine/google//modules/beta-private-cluster"
project_id = var.project_id
name = "cluster"
regional = true
region = var.region
zones = ["${var.region}-a", "${var.region}-b", "${var.region}-c"]
network = module.vpc.network_name
subnetwork = module.vpc.subnets_names[0]
ip_range_pods = "gke-pod-ip-range"
ip_range_services = "gke-service-ip-range"
horizontal_pod_autoscaling = true
enable_private_nodes = true
master_ipv4_cidr_block = "${var.control_plane_cidr}"
node_pools = [
{
name = "node-pool"
machine_type = "${var.machine_type}"
node_locations = "${var.region}-a,${var.region}-b,${var.region}-c"
min_count = "${var.node_pools_min_count}"
max_count = "${var.node_pools_max_count}"
disk_size_gb = "${var.node_pools_disk_size_gb}"
auto_repair = true
auto_upgrade = true
},
]
}
# Configure the authentication and authorisation of the cluster
module "gke_auth" {
source = "terraform-google-modules/kubernetes-engine/google//modules/auth"
depends_on = [module.gke]
project_id = var.project_id
location = module.gke.location
cluster_name = module.gke.name
}
# Define a local file that will store the necessary info such as certificate, user and endpoint to access the cluster
resource "local_file" "kubeconfig" {
content = module.gke_auth.kubeconfig_raw
filename = "~/.kube/config"
}
UPDATE:
We found that the automated .kube/config file was created with the wrong public IP of the API Server whereas the one created with the gcloud command contains the correct IP of the API Server. Any guesses why terraform one is fetching the wrong public IP?

How to create routing for kubernetes with nginx ingress in terraform - scaleway

I have a kubernetes setup with a cluster and two pools (nodes), I have also setup an (nginx) ingress server for kubernetes with helm. All of this is written in terraform for scaleway. What I am struggling with is how to config the ingress server to route to my kubernetes pools/nodes depending on the url path.
For example, I want [url]/api to go to my scaleway_k8s_pool.api and [url]/auth to go to my scaleway_k8s_pool.auth.
This is my terraform code
provider "scaleway" {
zone = "fr-par-1"
region = "fr-par"
}
resource "scaleway_registry_namespace" "main" {
name = "main_container_registry"
description = "Main container registry"
is_public = false
}
resource "scaleway_k8s_cluster" "main" {
name = "main"
description = "The main cluster"
version = "1.20.5"
cni = "calico"
tags = ["i'm an awsome tag"]
autoscaler_config {
disable_scale_down = false
scale_down_delay_after_add = "5m"
estimator = "binpacking"
expander = "random"
ignore_daemonsets_utilization = true
balance_similar_node_groups = true
expendable_pods_priority_cutoff = -5
}
}
resource "scaleway_k8s_pool" "api" {
cluster_id = scaleway_k8s_cluster.main.id
name = "api"
node_type = "DEV1-M"
size = 1
autoscaling = true
autohealing = true
min_size = 1
max_size = 5
}
resource "scaleway_k8s_pool" "auth" {
cluster_id = scaleway_k8s_cluster.main.id
name = "auth"
node_type = "DEV1-M"
size = 1
autoscaling = true
autohealing = true
min_size = 1
max_size = 5
}
resource "null_resource" "kubeconfig" {
depends_on = [scaleway_k8s_pool.api, scaleway_k8s_pool.auth] # at least one pool here
triggers = {
host = scaleway_k8s_cluster.main.kubeconfig[0].host
token = scaleway_k8s_cluster.main.kubeconfig[0].token
cluster_ca_certificate = scaleway_k8s_cluster.main.kubeconfig[0].cluster_ca_certificate
}
}
output "cluster_url" {
value = scaleway_k8s_cluster.main.apiserver_url
}
provider "helm" {
kubernetes {
host = null_resource.kubeconfig.triggers.host
token = null_resource.kubeconfig.triggers.token
cluster_ca_certificate = base64decode(
null_resource.kubeconfig.triggers.cluster_ca_certificate
)
}
}
resource "helm_release" "ingress" {
name = "ingress"
chart = "ingress-nginx"
repository = "https://kubernetes.github.io/ingress-nginx"
namespace = "kube-system"
}
How would i go about configuring the nginx ingress server for routing to my kubernetes pools?

Is it possible to create a zone only node pool in a regional cluster in GKE?

I have a regional cluster for redundancy. In this cluster I want to create a node-pool in just 1 zone in this region. Is this configuration possible? reason I trying this is, I want to run service like RabbitMQ in just 1 zone to avoid split, and my application services running on all zones in the region for redundancy.
I am using terraform to create the cluster and node pools, below is my config for creating region cluster and zone node pool
resource "google_container_cluster" "regional_cluster" {
provider = google-beta
project = "my-project"
name = "my-cluster"
location = "us-central1"
node_locations = ["us-central1-a", "us-central1-b", "us-central1-c"]
master_auth {
username = ""
password = ""
client_certificate_config {
issue_client_certificate = false
}
}
}
resource "google_container_node_pool" "one_zone" {
project = google_container_cluster.regional_cluster.project
name = "zone-pool"
location = "us-central1-b"
cluster = google_container_cluster.regional_cluster.name
node_config {
machine_type = var.machine_type
image_type = var.image_type
disk_size_gb = 100
disk_type = "pd-standard"
}
}
This throws an error message
error creating NodePool: googleapi: Error 404: Not found: projects/my-project/zones/us-central1-b/clusters/my-cluster., notFound
Found out that location in google_container_node_pool should specify cluster master's region/zone. To actually specify the node-pool location node_locations should be used. Below is the config that worked
resource "google_container_cluster" "regional_cluster" {
provider = google-beta
project = "my-project"
name = "my-cluster"
location = "us-central1"
node_locations = ["us-central1-a", "us-central1-b", "us-central1-c"]
master_auth {
username = ""
password = ""
client_certificate_config {
issue_client_certificate = false
}
}
}
resource "google_container_node_pool" "one_zone" {
project = google_container_cluster.regional_cluster.project
name = "zone-pool"
location = google_container_cluster.regional_cluster.location
node_locations = ["us-central1-b"]
cluster = google_container_cluster.regional_cluster.name
node_config {
machine_type = var.machine_type
image_type = var.image_type
disk_size_gb = 100
disk_type = "pd-standard"
}
}

Add secret to freshly created Azure AKS using Terraform Kubernetes provider fails

I am creating a kubernetes cluster with the Azure Terraform provider and trying to add a secret to it. The cluster gets created fine but I am getting errors with authenticating to the cluster when creating the secret. I tried 2 different Terraform Kubernetes provider configurations. Here is the main configuration:
variable "client_id" {}
variable "client_secret" {}
resource "azurerm_resource_group" "rg-example" {
name = "rg-example"
location = "East US"
}
resource "azurerm_kubernetes_cluster" "k8s-example" {
name = "k8s-example"
location = azurerm_resource_group.rg-example.location
resource_group_name = azurerm_resource_group.rg-example.name
dns_prefix = "k8s-example"
default_node_pool {
name = "default"
node_count = 1
vm_size = "Standard_B2s"
}
service_principal {
client_id = var.client_id
client_secret = var.client_secret
}
role_based_access_control {
enabled = true
}
}
resource "kubernetes_secret" "secret_example" {
metadata {
name = "mysecret"
}
data = {
"something" = "super secret"
}
depends_on = [
azurerm_kubernetes_cluster.k8s-example
]
}
provider "azurerm" {
version = "=2.29.0"
features {}
}
output "host" {
value = azurerm_kubernetes_cluster.k8s-example.kube_config.0.host
}
output "cluster_username" {
value = azurerm_kubernetes_cluster.k8s-example.kube_config.0.username
}
output "cluster_password" {
value = azurerm_kubernetes_cluster.k8s-example.kube_config.0.password
}
output "client_key" {
value = azurerm_kubernetes_cluster.k8s-example.kube_config.0.client_key
}
output "client_certificate" {
value = azurerm_kubernetes_cluster.k8s-example.kube_config.0.client_certificate
}
output "cluster_ca_certificate" {
value = azurerm_kubernetes_cluster.k8s-example.kube_config.0.cluster_ca_certificate
}
Here is the first Kubernetes provider configuration using certificates:
provider "kubernetes" {
version = "=1.13.2"
load_config_file = "false"
host = azurerm_kubernetes_cluster.k8s-example.kube_config.0.host
client_certificate = azurerm_kubernetes_cluster.k8s-example.kube_config.0.client_certificate
client_key = azurerm_kubernetes_cluster.k8s-example.kube_config.0.client_key
cluster_ca_certificate = azurerm_kubernetes_cluster.k8s-example.kube_config.0.cluster_ca_certificate
}
And the error I'm receiving:
kubernetes_secret.secret_example: Creating...
Error: Failed to configure client: tls: failed to find any PEM data in certificate input
Here is the second Kubernetes provider configuration using HTTP Basic Authorization:
provider "kubernetes" {
version = "=1.13.2"
load_config_file = "false"
host = azurerm_kubernetes_cluster.k8s-example.kube_config.0.host
username = azurerm_kubernetes_cluster.k8s-example.kube_config.0.username
password = azurerm_kubernetes_cluster.k8s-example.kube_config.0.password
}
And the error I'm receiving:
kubernetes_secret.secret_example: Creating...
Error: Post "https://k8s-example-c4a78c03.hcp.eastus.azmk8s.io:443/api/v1/namespaces/default/secrets": x509: certificate signed by unknown authority
ANALYSIS
I checked the outputs of azurerm_kubernetes_cluster.k8s-example and the data seems valid (username, password, host, etc..) Maybe I need a SSL certificate on my Kubernetes cluster, however I'm am not certain, as I'm new to this. Can someone help me out ?
According to this issue in hashicorp/terraform-provider-kubernetes, you need to use base64decode(). The example that author used:
provider "kubernetes" {
host = "${google_container_cluster.k8sexample.endpoint}"
username = "${var.master_username}"
password = "${var.master_password}"
client_certificate = "${base64decode(google_container_cluster.k8sexample.master_auth.0.client_certificate)}"
client_key = "${base64decode(google_container_cluster.k8sexample.master_auth.0.client_key)}"
cluster_ca_certificate = "${base64decode(google_container_cluster.k8sexample.master_auth.0.cluster_ca_certificate)}"
}
That author said they got the same error as you if they left out the base64decode. You can read more about that function here: https://www.terraform.io/docs/configuration/functions/base64decode.html