Terraform to prevent forced updated of AWS EKS cluster - kubernetes

I am using terraform aws eks registry module
https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/12.1.0?tab=inputs
Today with a new change to TF configs (unrelated to EKS) I saw that my EKS worker nodes are going to be rebuilt due to AMI updates which I am trying to prevent.
# module.kubernetes.module.eks-cluster.aws_launch_configuration.workers[0] must be replaced
+/- resource "aws_launch_configuration" "workers" {
~ arn = "arn:aws:autoscaling:us-east-2:555065427312:launchConfiguration:6c59fac6-5912-4079-8cc9-268a7f7fc98b:launchConfigurationName/edna-dev-eks-02020061119383942580000000b" -> (known after apply)
associate_public_ip_address = false
ebs_optimized = true
enable_monitoring = true
iam_instance_profile = "edna-dev-eks20200611193836418800000007"
~ id = "edna-dev-eks-02020061119383942580000000b" -> (known after apply)
~ image_id = "ami-05fc7ae9bc84e5708" -> "ami-073f227b0cd9507f9" # forces replacement
instance_type = "t3.medium"
+ key_name = (known after apply)
~ name = "edna-dev-eks-02020061119383942580000000b" -> (known after apply)
name_prefix = "edna-dev-eks-0"
security_groups = [
"sg-09b14dfce82015a63",
]
The rebuild happens because EKS got updated version of the AMI for worker nodes of the cluster.
This is my EKS terraform config
###################################################################################
# EKS CLUSTER #
# #
# This module contains configuration for EKS cluster running various applications #
###################################################################################
module "eks_label" {
source = "git::https://github.com/cloudposse/terraform-null-label.git?ref=master"
namespace = var.project
environment = var.environment
attributes = [var.component]
name = "eks"
}
data "aws_eks_cluster" "cluster" {
name = module.eks-cluster.cluster_id
}
data "aws_eks_cluster_auth" "cluster" {
name = module.eks-cluster.cluster_id
}
provider "kubernetes" {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
token = data.aws_eks_cluster_auth.cluster.token
load_config_file = false
version = "~> 1.9"
}
module "eks-cluster" {
source = "terraform-aws-modules/eks/aws"
cluster_name = module.eks_label.id
cluster_version = "1.16"
subnets = var.subnets
vpc_id = var.vpc_id
worker_groups = [
{
instance_type = var.cluster_node_type
asg_max_size = var.cluster_node_count
}
]
tags = var.tags
}
If I am trying to add lifecycle block in the module config
lifecycle {
ignore_changes = [image_id]
}
I get error:
➜ terraform plan
Error: Reserved block type name in module block
on modules/kubernetes/main.tf line 45, in module "eks-cluster":
45: lifecycle {
The block type name "lifecycle" is reserved for use by Terraform in a future
version.
Any ideas?

What about trying to use the worker_ami_name_filter variable for terraform-aws-modules/eks/aws to specifically find only your current AMI?
For example:
module "eks-cluster" {
source = "terraform-aws-modules/eks/aws"
cluster_name = module.eks_label.id
<...snip...>
worker_ami_name_filter = "amazon-eks-node-1.16-v20200531"
}
You can use AWS web console or cli to map the AMI IDs to their names:
user#localhost:~$ aws ec2 describe-images --filters "Name=name,Values=amazon-eks-node-1.16*" --region us-east-2 --output json | jq '.Images[] | "\(.Name) \(.ImageId)"'
"amazon-eks-node-1.16-v20200423 ami-01782c0e32657accf"
"amazon-eks-node-1.16-v20200531 ami-05fc7ae9bc84e5708"
"amazon-eks-node-1.16-v20200609 ami-073f227b0cd9507f9"
"amazon-eks-node-1.16-v20200507 ami-0edc51bc2f03c9dc2"
But why are you trying to prevent the Auto Scaling Group from using a newer AMI? It will only apply the newer AMI to new nodes. It won't terminate existing nodes just to update them.

Related

Kubernetes cluster unreachable when the vm_size was changed in azurerm_kubernetes_cluster

Terraform version
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "=3.16.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "=2.11.0"
}
helm = {
source = "hashicorp/helm"
version = "=2.6.0"
}
}
required_version = "=1.2.6"
}
Terraform Code
resource "azurerm_kubernetes_cluster" "my_cluster" {
name = local.cluster_name
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
dns_prefix = local.dns_prefix
node_resource_group = local.resource_group_node_name
kubernetes_version = "1.24.3"
automatic_channel_upgrade = "patch"
sku_tier = var.sku_tier
default_node_pool {
name = "default"
type = "VirtualMachineScaleSets"
vm_size = var.default_pool_vm_size
enable_auto_scaling = true
max_count = var.default_pool_max_count
min_count = var.default_pool_min_count
os_disk_type = "Ephemeral"
os_disk_size_gb = var.default_pool_os_disk_size_gb
}
identity {
type = "SystemAssigned"
}
network_profile {
network_plugin = "kubenet"
}
}
provider "helm" {
kubernetes {
host = azurerm_kubernetes_cluster.my_cluster.kube_admin_config.0.host
client_certificate = base64decode(azurerm_kubernetes_cluster.my_cluster.kube_admin_config.0.client_certificate)
client_key = base64decode(azurerm_kubernetes_cluster.my_cluster.kube_admin_config.0.client_key)
cluster_ca_certificate = base64decode(azurerm_kubernetes_cluster.my_cluster.kube_admin_config.0.cluster_ca_certificate)
}
}
resource "helm_release" "argocd" {
name = "argocd"
repository = "https://argoproj.github.io/argo-helm"
chart = "argo-cd"
version = "4.10.5"
create_namespace = true
namespace = "argocd"
}
Steps to Reproduce
All resources were created successfully when I executed the terraform code at first time creation.
But it was failed on terraform plan when I changed the vm_size in default node pool.
$ terraform plan
Error: Kubernetes cluster unreachable: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
with helm_release.argocd,
│ on argocd.tf line 1, in resource "helm_release" "argocd":
│ 1: resource "helm_release" "argocd" {
Expected Behavior
The cluster should be reachable even the vm_size was changed.
Actual Behavior
Kubernetes cluster is unreachable for other terraform providers (ex: kubernetes, helm)
Test
I removed resource argocd to prevent the above situation, then terraform could plan and apply successfully.
I get the cluster config data from azure portal, the azurePortalFQDN is different between first time creation.
Question
Will The cluster be recreated if I change default node pool config which is commented “Changing this forces a new resource to be created” on terraform documents? Or only the default node pool will be deleted then create new one however the aks cluster doesn't changed?
Why provider helm could connect to the cluster at first time creation, but it was failed to connect when the resource recreation?
Thanks for your reply.

Terraform Plan deleting AKS Node Pool

Terraform plan always forces AKS cluster to be recreated if we increase worker node in node pool
Trying Creating AKS Cluster with 1 worker node, via Terraform, it went well , Cluster is Up and running.
Post that, i tried to add one more worker node in my AKS, Terraform Show Plan: 2 to add, 0 to change, 2 to destroy.
Not Sure how can we increase worker node in aks node pool, if it delate the existing node pool.
default_node_pool {
name = var.nodepool_name
vm_size = var.instance_type
orchestrator_version = data.azurerm_kubernetes_service_versions.current.latest_version
availability_zones = var.zones
enable_auto_scaling = var.node_autoscalling
node_count = var.instance_count
enable_node_public_ip = var.publicip
vnet_subnet_id = data.azurerm_subnet.subnet.id
node_labels = {
"node_pool_type" = var.tags[0].node_pool_type
"environment" = var.tags[0].environment
"nodepool_os" = var.tags[0].nodepool_os
"application" = var.tags[0].application
"manged_by" = var.tags[0].manged_by
}
}
Error
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement
Terraform will perform the following actions:
# azurerm_kubernetes_cluster.aks_cluster must be replaced
-/+ resource "azurerm_kubernetes_cluster" "aks_cluster" {
Thanks
Satyam
I tested the same in my environment by creating a cluster with 2 node counts and then changed it to 3 using something like below :
If you are using HTTP_proxy then it will by default force a replacement on that block and that's the reason the whole cluster will get replaced with the new configurations.
So, for a solution you can use lifecycle block in your code as I have done below:
lifecycle {
ignore_changes = [http_proxy_config]
}
The code will be :
resource "azurerm_kubernetes_cluster" "aks_cluster" {
name = "${var.global-prefix}-${var.cluster-id}-${var.envid}-azwe-aks-01"
location = data.azurerm_resource_group.example.location
resource_group_name = data.azurerm_resource_group.example.name
dns_prefix = "${var.global-prefix}-${var.cluster-id}-${var.envid}-azwe-aks-01"
kubernetes_version = var.cluster-version
private_cluster_enabled = var.private_cluster
default_node_pool {
name = var.nodepool_name
vm_size = var.instance_type
orchestrator_version = data.azurerm_kubernetes_service_versions.current.latest_version
availability_zones = var.zones
enable_auto_scaling = var.node_autoscalling
node_count = var.instance_count
enable_node_public_ip = var.publicip
vnet_subnet_id = azurerm_subnet.example.id
}
# RBAC and Azure AD Integration Block
role_based_access_control {
enabled = true
}
http_proxy_config {
http_proxy = "http://xxxx"
https_proxy = "http://xxxx"
no_proxy = ["localhost","xxx","xxxx"]
}
# Identity (System Assigned or Service Principal)
identity {
type = "SystemAssigned"
}
# Add On Profiles
addon_profile {
azure_policy {enabled = true}
}
# Network Profile
network_profile {
network_plugin = "azure"
network_policy = "calico"
}
lifecycle {
ignore_changes = [http_proxy_config]
}
}

Managing GCP Composer Kubernetes cluster using Terraform

I've created a GCP Composer environment using Terraform:
terraform {
required_providers {
google = {
source = "hashicorp/google"
version = "3.5.0"
}
}
}
provider "google" {
credentials = file("my_key.json")
project = "my_project_id"
region = "us-east1"
zone = "us-east1-b"
}
resource "google_composer_environment" "my_composer_id" {
name = "my_composer_name"
region = "us-east1"
config {
node_count = 3
node_config {
zone = "us-east1-b"
machine_type = "n1-standard-1"
}
}
}
Composer also automatically creates a Kubernetes Engine cluster. Such cluster has a single node pool called default-pool. I'd like to create a new node pool inside the cluster created by Composer. Something like this:
resource "google_container_node_pool" "my_node_pool_id" {
name = "my_node_pool_name"
location = "us-east1"
cluster = ????
node_count = 0
node_config {
preemptible = true
machine_type = "n1-standard-1"
}
autoscaling {
min_node_count = 0
max_node_count = 3
}
}
However, as I didn't create the cluster in the Terraform file (as it was automatically created by Composer), I don't have the reference to it.
Cluster name can be accessed via the key gke_cluster available in the config section of your Cloud Composer environment :
resource "google_container_node_pool" "my_node_pool_id" {
name = "my_node_pool_name"
location = "us-east1-b"
cluster = element(
split("/",
lookup(
google_composer_environment.my_composer_id.config[0],
"gke_cluster"
)
),
5
)
// ...
}
The 5th element corresponds to the name of the GKE cluster.

How can I name eks worker nodes provisioned with terraform?

I am using terraform 12.20.0 and I have provisioned an EKS cluster with 2 node groups.
How can I add name tags to EKS node workers according to their node group names?
I have tried adding "Name" tag in the additional tag sections of each node-group but the tags did not take and my EC2 instance names are empty, while other tags appear.
Here is the configuration - I have skipped the less relevant bits:
module "eks-cluster" {
...
node_groups_defaults = {
disk_size = 128
key_name = var.key_name
subnets = [
aws_subnet.{{}}.id,
aws_subnet.{{}}.id,
]
k8s_labels = {
env = var.environment
}
additional_tags = {
env = var.environment
"k8s.io/cluster-autoscaler/enabled" = "true"
"k8s.io/cluster-autoscaler/${var.cluster-name}" = "true"
}
}
node_groups = {
app = {
name = "app"
.....
k8s_labels = {
nodegroup = "app"
}
additional_tags = {
nodegroup = "app"
Name = "${var.environment}-app-node"
}
}
ml = {
name = "ml"
...
instance_type = "m5.xlarge"
k8s_labels = {
nodegroup = "ml"
}
additional_tags = {
nodegroup = "ml"
Name = "${var.environment}-ml-node"
}
}
}
tags = {
env = var.environment
}
map_roles = [{
......
}]
}
As per documentation Resource: aws_eks_node_group doesn't allow for modifying tags on your instances.
There is a nice feature coming soon to EKS node groups which will allow you to pass a custom userdata script. Using that you will be able to modify programatically tags for your instances. Issues can be tracked -> https://github.com/aws/containers-roadmap/issues/596
UPDATE:
As of 20/08/2020, you can now utilise launch_template with your node group. This will allow you to pass in Name tag. Example:
resource "aws_launch_template" "cluster" {
image_id = data.aws_ssm_parameter.cluster.value
instance_type = "t3.medium"
name = "eks-launch-template-test"
update_default_version = true
tag_specifications {
resource_type = "instance"
tags = {
Name = "eks-node-group-instance-name"
}
}
}
The following Terraform resource works.
resource "aws_autoscaling_group_tag" "your_group_tag" {
autoscaling_group_name = aws_eks_node_group.your_group.resources[0].autoscaling_groups[0].name
tag {
key = "Name"
value = "enter-your-name-here"
propagate_at_launch = true
}
depends_on = [
aws_eks_node_group.your_group
]
}
Had the same issue, can this is the solution I came up with which works great.
First, create the ASG tags via the aws_autoscaling_group_tag resource.
resource "aws_autoscaling_group_tag" "mytag" {
autoscaling_group_name = aws_eks_node_group.main.resources[0].autoscaling_groups[0].name
tag {
key = "foo"
value = "bar"
propagate_at_launch = true
}
depends_on = [aws_eks_node_group.main]
}
Unfortunately this resource block doesn't accept multiple tags, so you'd have to create this resource block individually for each tag.
Another thing to keep in mind, is that the tags are applied to future scaled EC2 instances, not the currently running ones.
Which means, that you need to either manually scale down your nodes and scale back up, or write a bash script and run it as a local provisioner with terraform.
resource "null_resource" "refresh_autoscale" {
provisioner "local-exec" {
command = "cd ${path.module}/scripts ; bash ./scale_refresh.sh"
environment = {
ASG_NAME = aws_eks_node_group.main.resources[0].autoscaling_groups[0].name
CLUSTER_NAME = "foo_cluster"
NODE_GROUP_NAME = "foo_cluster_node"
REGION = var.region
AWS_PROFILE = var.aws_profile
DESIRED_SIZE = var.desired_size
MIN_SIZE = var.min_size
MAX_SIZE = var.max_size
}
}
depends_on = [aws_eks_node_group.main]
}
Your bash script can run commands via the AWS CLI to scale down and up your node groups.
aws --profile ${AWS_PROFILE} --region ${REGION} eks update-nodegroup-config --cluster-name ${CLUSTER_NAME} \
--scaling-config "minSize=0,maxSize=1,desiredSize=0" --nodegroup-name ${NODE_GROUP_NAME}
Because the instances are not scaled down immediately, there is a period of waiting for the scale down to complete. If you have jq installed, you can periodically query the state of your ASG and see how many instances are currently running.
INSTANCE_COUNT=$(aws --profile ${AWS_PROFILE} --region ${REGION} autoscaling describe-auto-scaling-groups --auto-scaling-group-name ${ASG_NAME} \
| jq '.[][0] | .Instances | length')
I just noticed that
"k8s.io/cluster-autoscaler/${var.cluster-name}" = "true"
Might need to be
"k8s.io/cluster-autoscaler/${var.cluster-name}" = "owned"
There is an existing issue with node group to add the "Name" tag on ASG. https://github.com/aws/containers-roadmap/issues/608 (open) and this on terraform end https://github.com/terraform-aws-modules/terraform-aws-eks/issues/860 (closed)
However, there is an alternative to use aws cli command to add tag explicitly.
Try below to use in terraform
resource "null_resource" "add_custom_tags_to_asg" {
for_each = module.eks-cluster.node_groups
provisioner "local-exec" {
command = <<EOF
aws autoscaling create-or-update-tags \
--tags ResourceId=${each.value.resources[0].autoscaling_groups[0].name},ResourceType=auto-scaling-group,Key=Name,Value=k8s-node-groups-${each.value.labels["env"]},PropagateAtLaunch=true
EOF
}
}

Terraform - staggered provider population

I have been looking at implementing Kubernetes with Terraform over the past week and I seem to have a lifecycle issue.
While I can make a Kubernetes resource depend on a cluster being spun up, the KUBECONFIG file isn't updated in the middle of the terraform apply.
The kubernete
resource "kubernetes_service" "example" {
...
depends_on = ["digitalocean_kubernetes_cluster.example"]
}
resource "digitalocean_kubernetes_cluster" "example" {
name = "example"
region = "${var.region}"
version = "1.12.1-do.2"
node_pool {
name = "woker-pool"
size = "s-1vcpu-2gb"
node_count = 1
}
provisioner "local-exec" {
command = "sh ./get-kubeconfig.sh" // gets KUBECONFIG file from digitalocean API.
environment = {
digitalocean_kubernetes_cluster_id = "${digitalocean_kubernetes_cluster.k8s.id}"
digitalocean_kubernetes_cluster_name = "${digitalocean_kubernetes_cluster.k8s.name}"
digitalocean_api_token = "${var.digitalocean_token}"
}
}
While I can pull the CONFIG file down using the API, terraform won't use this file, because the terraform plan is already in motion
I've seen some examples using ternary operators (resource ? 1 : 0) but I haven't found a workaround for non count created clusters besides -target
Ideally, I'd like to create this with one terraform repo.
It turns out that the digitalocean_kubernetes_cluster resource has an attribute which can be passed to the provider "kubernetes" {} like so:
resource "digitalocean_kubernetes_cluster" "k8s" {
name = "k8s"
region = "${var.region}"
version = "1.12.1-do.2"
node_pool {
name = "woker-pool"
size = "s-1vcpu-2gb"
node_count = 1
}
}
provider "kubernetes" {
host = "${digitalocean_kubernetes_cluster.k8s.endpoint}"
client_certificate = "${base64decode(digitalocean_kubernetes_cluster.k8s.kube_config.0.client_certificate)}"
client_key = "${base64decode(digitalocean_kubernetes_cluster.k8s.kube_config.0.client_key)}"
cluster_ca_certificate = "${base64decode(digitalocean_kubernetes_cluster.k8s.kube_config.0.cluster_ca_certificate)}"
}
It results in one provider being dependant on the other, and acts accordingly.