I read through the karpenter document at https://karpenter.sh/v0.16.1/getting-started/getting-started-with-terraform/#install-karpenter-helm-chart. I followed instructions step by step. I got errors at the end.
kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller
DEBUG controller.provisioning Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.topologySpreadConstraints = {"maxSkew":1,"topologyKey":"topology.kubernetes.io/zone","whenUnsatisfiable":"ScheduleAnyway","labelSelector":{"matchLabels":{"app.kubernetes.io/instance":"karpenter","app.kubernetes.io/name":"karpenter"}}} {"commit": "b157d45", "pod": "karpenter/karpenter-5755bb5b54-rh65t"}
2022-09-10T00:13:13.122Z
ERROR controller.provisioning Could not schedule pod, incompatible with provisioner "default", incompatible requirements, key karpenter.sh/provisioner-name, karpenter.sh/provisioner-name DoesNotExist not in karpenter.sh/provisioner-name In [default] {"commit": "b157d45", "pod": "karpenter/karpenter-5755bb5b54-rh65t"}
Below is the source code:
cat main.tf
terraform {
required_version = "~> 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 4.0"
}
helm = {
source = "hashicorp/helm"
version = "~> 2.5"
}
kubectl = {
source = "gavinbunney/kubectl"
version = "~> 1.14"
}
}
}
provider "aws" {
region = "us-east-1"
}
locals {
cluster_name = "karpenter-demo"
# Used to determine correct partition (i.e. - `aws`, `aws-gov`, `aws-cn`, etc.)
partition = data.aws_partition.current.partition
}
data "aws_partition" "current" {}
module "vpc" {
# https://registry.terraform.io/modules/terraform-aws-modules/vpc/aws/latest
source = "terraform-aws-modules/vpc/aws"
version = "3.14.4"
name = local.cluster_name
cidr = "10.0.0.0/16"
azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
enable_nat_gateway = true
single_nat_gateway = true
one_nat_gateway_per_az = false
public_subnet_tags = {
"kubernetes.io/cluster/${local.cluster_name}" = "shared"
"kubernetes.io/role/elb" = 1
}
private_subnet_tags = {
"kubernetes.io/cluster/${local.cluster_name}" = "shared"
"kubernetes.io/role/internal-elb" = 1
}
}
module "eks" {
# https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest
source = "terraform-aws-modules/eks/aws"
version = "18.29.0"
cluster_name = local.cluster_name
cluster_version = "1.22"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
# Required for Karpenter role below
enable_irsa = true
node_security_group_additional_rules = {
ingress_nodes_karpenter_port = {
description = "Cluster API to Node group for Karpenter webhook"
protocol = "tcp"
from_port = 8443
to_port = 8443
type = "ingress"
source_cluster_security_group = true
}
}
node_security_group_tags = {
# NOTE - if creating multiple security groups with this module, only tag the
# security group that Karpenter should utilize with the following tag
# (i.e. - at most, only one security group should have this tag in your account)
"karpenter.sh/discovery/${local.cluster_name}" = local.cluster_name
}
# Only need one node to get Karpenter up and running.
# This ensures core services such as VPC CNI, CoreDNS, etc. are up and running
# so that Karpenter can be deployed and start managing compute capacity as required
eks_managed_node_groups = {
initial = {
instance_types = ["m5.large"]
# Not required nor used - avoid tagging two security groups with same tag as well
create_security_group = false
min_size = 1
max_size = 1
desired_size = 1
iam_role_additional_policies = [
"arn:${local.partition}:iam::aws:policy/AmazonSSMManagedInstanceCore", # Required by Karpenter
"arn:${local.partition}:iam::aws:policy/AmazonEKSWorkerNodePolicy",
"arn:${local.partition}:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly", #for access to ECR images
"arn:${local.partition}:iam::aws:policy/CloudWatchAgentServerPolicy"
]
tags = {
# This will tag the launch template created for use by Karpenter
"karpenter.sh/discovery/${local.cluster_name}" = local.cluster_name
}
}
}
}
#The EKS module creates an IAM role for the EKS managed node group nodes. We’ll use that for Karpenter.
#We need to create an instance profile we can reference.
#Karpenter can use this instance profile to launch new EC2 instances and those instances will be able to connect to your cluster.
resource "aws_iam_instance_profile" "karpenter" {
name = "KarpenterNodeInstanceProfile-${local.cluster_name}"
role = module.eks.eks_managed_node_groups["initial"].iam_role_name
}
#Create the KarpenterController IAM Role
#Karpenter requires permissions like launching instances, which means it needs an IAM role that grants it access. The config
#below will create an AWS IAM Role, attach a policy, and authorize the Service Account to assume the role using IRSA. We will
#create the ServiceAccount and connect it to this role during the Helm chart install.
module "karpenter_irsa" {
source = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
version = "5.3.3"
role_name = "karpenter-controller-${local.cluster_name}"
attach_karpenter_controller_policy = true
karpenter_tag_key = "karpenter.sh/discovery/${local.cluster_name}"
karpenter_controller_cluster_id = module.eks.cluster_id
karpenter_controller_node_iam_role_arns = [
module.eks.eks_managed_node_groups["initial"].iam_role_arn
]
oidc_providers = {
ex = {
provider_arn = module.eks.oidc_provider_arn
namespace_service_accounts = ["karpenter:karpenter"]
}
}
}
#Install Karpenter Helm Chart
#Use helm to deploy Karpenter to the cluster. We are going to use the helm_release Terraform resource to do the deploy and pass in the
#cluster details and IAM role Karpenter needs to assume.
provider "helm" {
kubernetes {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
args = ["eks", "get-token", "--cluster-name", local.cluster_name]
}
}
}
resource "helm_release" "karpenter" {
namespace = "karpenter"
create_namespace = true
name = "karpenter"
repository = "https://charts.karpenter.sh"
chart = "karpenter"
version = "v0.16.1"
set {
name = "serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn"
value = module.karpenter_irsa.iam_role_arn
}
set {
name = "clusterName"
value = module.eks.cluster_id
}
set {
name = "clusterEndpoint"
value = module.eks.cluster_endpoint
}
set {
name = "aws.defaultInstanceProfile"
value = aws_iam_instance_profile.karpenter.name
}
}
#Provisioner
#Create a default provisioner using the command below. This provisioner configures instances to connect to your cluster’s endpoint and
#discovers resources like subnets and security groups using the cluster’s name.
#This provisioner will create capacity as long as the sum of all created capacity is less than the specified limit.
provider "kubectl" {
apply_retry_count = 5
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
load_config_file = false
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
}
}
resource "kubectl_manifest" "karpenter_provisioner" {
yaml_body = <<-YAML
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: default
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
limits:
resources:
cpu: 1000
provider:
subnetSelector:
Name: "*private*"
securityGroupSelector:
karpenter.sh/discovery/${module.eks.cluster_id}: ${module.eks.cluster_id}
tags:
karpenter.sh/discovery/${module.eks.cluster_id}: ${module.eks.cluster_id}
ttlSecondsAfterEmpty: 30
YAML
depends_on = [
helm_release.karpenter
]
}
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: inflate
spec:
replicas: 0
selector:
matchLabels:
app: inflate
template:
metadata:
labels:
app: inflate
spec:
terminationGracePeriodSeconds: 0
containers:
- name: inflate
image: public.ecr.aws/eks-distro/kubernetes/pause:3.2
resources:
requests:
cpu: 1
EOF
kubectl scale deployment inflate --replicas 5
kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller
DEBUG controller.provisioning Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.topologySpreadConstraints = {"maxSkew":1,"topologyKey":"topology.kubernetes.io/zone","whenUnsatisfiable":"ScheduleAnyway","labelSelector":{"matchLabels":{"app.kubernetes.io/instance":"karpenter","app.kubernetes.io/name":"karpenter"}}} {"commit": "b157d45", "pod": "karpenter/karpenter-5755bb5b54-rh65t"}
2022-09-10T00:13:13.122Z
ERROR controller.provisioning Could not schedule pod, incompatible with provisioner "default", incompatible requirements, key karpenter.sh/provisioner-name, karpenter.sh/provisioner-name DoesNotExist not in karpenter.sh/provisioner-name In [default] {"commit": "b157d45", "pod": "karpenter/karpenter-5755bb5b54-rh65t"}
I belive this is due to the pod topology defined in the Karpenter deployment here:
https://github.com/aws/karpenter/blob/main/charts/karpenter/values.yaml#L73-L77
, you can read further on what pod topologySpreadConstraints does here:
https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/
If you increase the desired_size to 2 which matches the default deployment replicas above, that should resove the error.
I am using terraform aws eks registry module
https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/12.1.0?tab=inputs
Today with a new change to TF configs (unrelated to EKS) I saw that my EKS worker nodes are going to be rebuilt due to AMI updates which I am trying to prevent.
# module.kubernetes.module.eks-cluster.aws_launch_configuration.workers[0] must be replaced
+/- resource "aws_launch_configuration" "workers" {
~ arn = "arn:aws:autoscaling:us-east-2:555065427312:launchConfiguration:6c59fac6-5912-4079-8cc9-268a7f7fc98b:launchConfigurationName/edna-dev-eks-02020061119383942580000000b" -> (known after apply)
associate_public_ip_address = false
ebs_optimized = true
enable_monitoring = true
iam_instance_profile = "edna-dev-eks20200611193836418800000007"
~ id = "edna-dev-eks-02020061119383942580000000b" -> (known after apply)
~ image_id = "ami-05fc7ae9bc84e5708" -> "ami-073f227b0cd9507f9" # forces replacement
instance_type = "t3.medium"
+ key_name = (known after apply)
~ name = "edna-dev-eks-02020061119383942580000000b" -> (known after apply)
name_prefix = "edna-dev-eks-0"
security_groups = [
"sg-09b14dfce82015a63",
]
The rebuild happens because EKS got updated version of the AMI for worker nodes of the cluster.
This is my EKS terraform config
###################################################################################
# EKS CLUSTER #
# #
# This module contains configuration for EKS cluster running various applications #
###################################################################################
module "eks_label" {
source = "git::https://github.com/cloudposse/terraform-null-label.git?ref=master"
namespace = var.project
environment = var.environment
attributes = [var.component]
name = "eks"
}
data "aws_eks_cluster" "cluster" {
name = module.eks-cluster.cluster_id
}
data "aws_eks_cluster_auth" "cluster" {
name = module.eks-cluster.cluster_id
}
provider "kubernetes" {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
token = data.aws_eks_cluster_auth.cluster.token
load_config_file = false
version = "~> 1.9"
}
module "eks-cluster" {
source = "terraform-aws-modules/eks/aws"
cluster_name = module.eks_label.id
cluster_version = "1.16"
subnets = var.subnets
vpc_id = var.vpc_id
worker_groups = [
{
instance_type = var.cluster_node_type
asg_max_size = var.cluster_node_count
}
]
tags = var.tags
}
If I am trying to add lifecycle block in the module config
lifecycle {
ignore_changes = [image_id]
}
I get error:
➜ terraform plan
Error: Reserved block type name in module block
on modules/kubernetes/main.tf line 45, in module "eks-cluster":
45: lifecycle {
The block type name "lifecycle" is reserved for use by Terraform in a future
version.
Any ideas?
What about trying to use the worker_ami_name_filter variable for terraform-aws-modules/eks/aws to specifically find only your current AMI?
For example:
module "eks-cluster" {
source = "terraform-aws-modules/eks/aws"
cluster_name = module.eks_label.id
<...snip...>
worker_ami_name_filter = "amazon-eks-node-1.16-v20200531"
}
You can use AWS web console or cli to map the AMI IDs to their names:
user#localhost:~$ aws ec2 describe-images --filters "Name=name,Values=amazon-eks-node-1.16*" --region us-east-2 --output json | jq '.Images[] | "\(.Name) \(.ImageId)"'
"amazon-eks-node-1.16-v20200423 ami-01782c0e32657accf"
"amazon-eks-node-1.16-v20200531 ami-05fc7ae9bc84e5708"
"amazon-eks-node-1.16-v20200609 ami-073f227b0cd9507f9"
"amazon-eks-node-1.16-v20200507 ami-0edc51bc2f03c9dc2"
But why are you trying to prevent the Auto Scaling Group from using a newer AMI? It will only apply the newer AMI to new nodes. It won't terminate existing nodes just to update them.
I'm using the AWS EKS provider (github.com/terraform-aws-modules/terraform-aws-eks ). I'm following along the tutorial with https://learn.hashicorp.com/terraform/aws/eks-intro
However this does not seem to have autoscaling enabled... It seems it's missing the cluster-autoscaler pod / daemon?
Is Terraform able to provision this functionality? Or do I need to set this up following a guide like: https://eksworkshop.com/scaling/deploy_ca/
You can deploy Kubernetes resources using Terraform. There is both a Kubernetes provider and a Helm provider.
data "aws_eks_cluster_auth" "authentication" {
name = "${var.cluster_id}"
}
provider "kubernetes" {
# Use the token generated by AWS iam authenticator to connect as the provider does not support exec auth
# see: https://github.com/terraform-providers/terraform-provider-kubernetes/issues/161
host = "${var.cluster_endpoint}"
cluster_ca_certificate = "${base64decode(var.cluster_certificate_authority_data)}"
token = "${data.aws_eks_cluster_auth.authentication.token}"
load_config_file = false
}
provider "helm" {
install_tiller = "true"
tiller_image = "gcr.io/kubernetes-helm/tiller:v2.12.3"
}
resource "helm_release" "cluster_autoscaler" {
name = "cluster-autoscaler"
repository = "stable"
chart = "cluster-autoscaler"
namespace = "kube-system"
version = "0.12.2"
set {
name = "autoDiscovery.enabled"
value = "true"
}
set {
name = "autoDiscovery.clusterName"
value = "${var.cluster_name}"
}
set {
name = "cloudProvider"
value = "aws"
}
set {
name = "awsRegion"
value = "${data.aws_region.current_region.name}"
}
set {
name = "rbac.create"
value = "true"
}
set {
name = "sslCertPath"
value = "/etc/ssl/certs/ca-bundle.crt"
}
}
This answer below is still not complete... But at least it gets me partially further...
1.
kubectl create clusterrolebinding add-on-cluster-admin --clusterrole=cluster-admin --serviceaccount=kube-system:default
helm install stable/cluster-autoscaler --name my-release --set "autoscalingGroups[0].name=demo,autoscalingGroups[0].maxSize=10,autoscalingGroups[0].minSize=1" --set rbac.create=true
And then manually fix the certificate path:
kubectl edit deployments my-release-aws-cluster-autoscaler
replace the following:
path: /etc/ssl/certs/ca-bundle.crt
With
path: /etc/ssl/certs/ca-certificates.crt
2.
In the AWS console, give AdministratorAccess policy to the terraform-eks-demo-node role.
3.
Update the nodes parameter with (kubectl edit deployments my-release-aws-cluster-autoscaler)
- --nodes=1:10:terraform-eks-demo20190922124246790200000007
I have been looking at implementing Kubernetes with Terraform over the past week and I seem to have a lifecycle issue.
While I can make a Kubernetes resource depend on a cluster being spun up, the KUBECONFIG file isn't updated in the middle of the terraform apply.
The kubernete
resource "kubernetes_service" "example" {
...
depends_on = ["digitalocean_kubernetes_cluster.example"]
}
resource "digitalocean_kubernetes_cluster" "example" {
name = "example"
region = "${var.region}"
version = "1.12.1-do.2"
node_pool {
name = "woker-pool"
size = "s-1vcpu-2gb"
node_count = 1
}
provisioner "local-exec" {
command = "sh ./get-kubeconfig.sh" // gets KUBECONFIG file from digitalocean API.
environment = {
digitalocean_kubernetes_cluster_id = "${digitalocean_kubernetes_cluster.k8s.id}"
digitalocean_kubernetes_cluster_name = "${digitalocean_kubernetes_cluster.k8s.name}"
digitalocean_api_token = "${var.digitalocean_token}"
}
}
While I can pull the CONFIG file down using the API, terraform won't use this file, because the terraform plan is already in motion
I've seen some examples using ternary operators (resource ? 1 : 0) but I haven't found a workaround for non count created clusters besides -target
Ideally, I'd like to create this with one terraform repo.
It turns out that the digitalocean_kubernetes_cluster resource has an attribute which can be passed to the provider "kubernetes" {} like so:
resource "digitalocean_kubernetes_cluster" "k8s" {
name = "k8s"
region = "${var.region}"
version = "1.12.1-do.2"
node_pool {
name = "woker-pool"
size = "s-1vcpu-2gb"
node_count = 1
}
}
provider "kubernetes" {
host = "${digitalocean_kubernetes_cluster.k8s.endpoint}"
client_certificate = "${base64decode(digitalocean_kubernetes_cluster.k8s.kube_config.0.client_certificate)}"
client_key = "${base64decode(digitalocean_kubernetes_cluster.k8s.kube_config.0.client_key)}"
cluster_ca_certificate = "${base64decode(digitalocean_kubernetes_cluster.k8s.kube_config.0.cluster_ca_certificate)}"
}
It results in one provider being dependant on the other, and acts accordingly.
Does anyone successfully setup kubernetes executor/runner on gitlab for CI jobs? I set up mine but its stucking on executing my pipeline indefinitely.
I'm running a runner as a docker container on top of kubernetes cluster and connecting to my gitlab instance for handling my CI builds.
Any working config file would be appreciated.
My runner configuration looks like this:
[[runners]]
name = "kube-executor"
url = "https://gitlab.example.ltd/"
token = "some-token"
executor = "kubernetes"
[runners.cache]
[runners.kubernetes]
host = "https://my-kubernetes-api-address:443"
ca_file = "/etc/ssl/certs/ca.crt"
cert_file = "/etc/ssl/certs/server.crt"
key_file = "/etc/ssl/certs/server.key"
image = "docker:latest"
namespace = "gitlab"
namespace_overwrite_allowed = "ci-.*"
privileged = true
cpu_limit = "1"
memory_limit = "1Gi"
service_cpu_limit = "1"
service_memory_limit = "1Gi"
helper_cpu_limit = "500m"
helper_memory_limit = "100Mi"
poll_interval = 5
poll_timeout = 3600
[runners.kubernetes.volumes]
this throws this error: ERROR: Job failed (system failure): Post https://my-kubernetes-api-address:443/api/v1/namespaces/gitlab/secrets: x509: certificate signed by unknown authority
you are using https, so where are the certs, are they self signed certs? if yes you have to mention --tls-cert-file and --tls-private-key-file flags in your configmap.
Copied from https://stackoverflow.com/a/43362697/432115