EKS kube-system deployments CrashLoopBackOff - kubernetes

I am trying to deploy Kube State Metrics into the kube-system namespace in my EKS Cluster (eks.4) running Kubernetes v1.14.
Kubernetes Connection
provider "kubernetes" {
host = var.cluster.endpoint
token = data.aws_eks_cluster_auth.cluster_auth.token
cluster_ca_certificate = base64decode(var.cluster.certificate)
load_config_file = true
}
Deployment Manifest (as .tf)
resource "kubernetes_deployment" "kube_state_metrics" {
metadata {
name = "kube-state-metrics"
namespace = "kube-system"
labels = {
k8s-app = "kube-state-metrics"
}
}
spec {
replicas = 1
selector {
match_labels = {
k8s-app = "kube-state-metrics"
}
}
template {
metadata {
labels = {
k8s-app = "kube-state-metrics"
}
}
spec {
container {
name = "kube-state-metrics"
image = "quay.io/coreos/kube-state-metrics:v1.7.2"
port {
name = "http-metrics"
container_port = 8080
}
port {
name = "telemetry"
container_port = 8081
}
liveness_probe {
http_get {
path = "/healthz"
port = "8080"
}
initial_delay_seconds = 5
timeout_seconds = 5
}
readiness_probe {
http_get {
path = "/"
port = "8080"
}
initial_delay_seconds = 5
timeout_seconds = 5
}
}
service_account_name = "kube-state-metrics"
}
}
}
}
I have deployed all the required RBAC manifests from https://github.com/kubernetes/kube-state-metrics/tree/master/kubernetes as well - redacted here for brevity.
When I run terraform apply on the deployment above, the Terraform output is as follows :
kubernetes_deployment.kube_state_metrics: Still creating... [6m50s elapsed]
Eventually timing out at 10m.
Here are the outputs of the logs for the kube-state-metrics pod
I0910 23:41:19.412496 1 main.go:140] metric white-blacklisting: blacklisting the following items:
W0910 23:41:19.412535 1 client_config.go:541] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
W0910 23:41:19.412565 1 client_config.go:546] error creating inClusterConfig, falling back to default config: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
F0910 23:41:19.412782 1 main.go:148] Failed to create client: invalid configuration: no configuration has been provided

Adding the following to the spec has taken me to a successful deployment.
automount_service_account_token = true
For posterity :
resource "kubernetes_deployment" "kube_state_metrics" {
metadata {
name = "kube-state-metrics"
namespace = "kube-system"
labels = {
k8s-app = "kube-state-metrics"
}
}
spec {
replicas = 1
selector {
match_labels = {
k8s-app = "kube-state-metrics"
}
}
template {
metadata {
labels = {
k8s-app = "kube-state-metrics"
}
}
spec {
automount_service_account_token = true
container {
name = "kube-state-metrics"
image = "quay.io/coreos/kube-state-metrics:v1.7.2"
port {
name = "http-metrics"
container_port = 8080
}
port {
name = "telemetry"
container_port = 8081
}
liveness_probe {
http_get {
path = "/healthz"
port = "8080"
}
initial_delay_seconds = 5
timeout_seconds = 5
}
readiness_probe {
http_get {
path = "/"
port = "8080"
}
initial_delay_seconds = 5
timeout_seconds = 5
}
}
service_account_name = "kube-state-metrics"
}
}
}
}

I didn't try with terraform.
I have just run this deployment locally i got the same error.
Please run your deployment locally to see the state of your deployment and pods.
I0910 13:25:49.632847 1 main.go:140] metric white-blacklisting: blacklisting the following items:
W0910 13:25:49.632871 1 client_config.go:541] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
and finally:
I0910 13:25:49.634748 1 main.go:185] Testing communication with server
I0910 13:25:49.650994 1 main.go:190] Running with Kubernetes cluster version: v1.12+. git version: v1.12.8-gke.10. git tree state: clean. commit: f53039cc1e5295eed20969a4f10fb6ad99461e37. platform: linux/amd64
I0910 13:25:49.651028 1 main.go:192] Communication with server successful
I0910 13:25:49.651598 1 builder.go:126] Active collectors: certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,limitranges,namespaces,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses
I0910 13:25:49.651607 1 main.go:226] Starting metrics server: 0.0.0.0:8080
I0910 13:25:49.652149 1 main.go:201] Starting kube-state-metrics self metrics server: 0.0.0.0:8081
verification:
Connected to kube-state-metrics (xx.xx.xx.xx) port 8080 (#0)
GET /metrics HTTP/1.1
Host: kube-state-metrics:8080
User-Agent: curl/7.58.0
Accept: */*
HTTP/1.1 200 OK
Content-Type: text/plain; version=0.0.4
Date: Tue, 10 Sep 2019 13:39:52 GMT
Transfer-Encoding: chunked
[49027 bytes data]
HELP kube_certificatesigningrequest_labels Kubernetes labels converted to
Prometheus labels.
If you are building own image please follow issues on gihtub and docs
update: Just to clarify.
AS mentioned in my answer. I didn't try with terraform but it seems that the first question described only one problem W0910 13:25:49.632871 1 client_config.go:541] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
So I suggested to run this deployment locally and verify all errors from the logs. Later occurred that there is a problem with automount_service_account_token. This important errors wasn't applied to the the original question.
So please follow terraform issues on github how you can manage to solve this problem
As per description on github:
I spent hours trying to figure out why a service account and deployment wasn't working in Terraform, but worked with no issues in kubectl - it was the AutomountServiceAccountToken being hardcoded to False in the deployment resource.
At a minimum this should be documented in the Terraform docs for the resource with something noting the resource does not behave like kubectl does.
I hope it explains this problem.

Related

ERROR controller.provisioning Could not schedule pod, incompatible with provisioner "default", incompatible requirements, key karpenter.sh/provisioner

I read through the karpenter document at https://karpenter.sh/v0.16.1/getting-started/getting-started-with-terraform/#install-karpenter-helm-chart. I followed instructions step by step. I got errors at the end.
kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller
DEBUG controller.provisioning Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.topologySpreadConstraints = {"maxSkew":1,"topologyKey":"topology.kubernetes.io/zone","whenUnsatisfiable":"ScheduleAnyway","labelSelector":{"matchLabels":{"app.kubernetes.io/instance":"karpenter","app.kubernetes.io/name":"karpenter"}}} {"commit": "b157d45", "pod": "karpenter/karpenter-5755bb5b54-rh65t"}
2022-09-10T00:13:13.122Z
ERROR controller.provisioning Could not schedule pod, incompatible with provisioner "default", incompatible requirements, key karpenter.sh/provisioner-name, karpenter.sh/provisioner-name DoesNotExist not in karpenter.sh/provisioner-name In [default] {"commit": "b157d45", "pod": "karpenter/karpenter-5755bb5b54-rh65t"}
Below is the source code:
cat main.tf
terraform {
required_version = "~> 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 4.0"
}
helm = {
source = "hashicorp/helm"
version = "~> 2.5"
}
kubectl = {
source = "gavinbunney/kubectl"
version = "~> 1.14"
}
}
}
provider "aws" {
region = "us-east-1"
}
locals {
cluster_name = "karpenter-demo"
# Used to determine correct partition (i.e. - `aws`, `aws-gov`, `aws-cn`, etc.)
partition = data.aws_partition.current.partition
}
data "aws_partition" "current" {}
module "vpc" {
# https://registry.terraform.io/modules/terraform-aws-modules/vpc/aws/latest
source = "terraform-aws-modules/vpc/aws"
version = "3.14.4"
name = local.cluster_name
cidr = "10.0.0.0/16"
azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
enable_nat_gateway = true
single_nat_gateway = true
one_nat_gateway_per_az = false
public_subnet_tags = {
"kubernetes.io/cluster/${local.cluster_name}" = "shared"
"kubernetes.io/role/elb" = 1
}
private_subnet_tags = {
"kubernetes.io/cluster/${local.cluster_name}" = "shared"
"kubernetes.io/role/internal-elb" = 1
}
}
module "eks" {
# https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest
source = "terraform-aws-modules/eks/aws"
version = "18.29.0"
cluster_name = local.cluster_name
cluster_version = "1.22"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
# Required for Karpenter role below
enable_irsa = true
node_security_group_additional_rules = {
ingress_nodes_karpenter_port = {
description = "Cluster API to Node group for Karpenter webhook"
protocol = "tcp"
from_port = 8443
to_port = 8443
type = "ingress"
source_cluster_security_group = true
}
}
node_security_group_tags = {
# NOTE - if creating multiple security groups with this module, only tag the
# security group that Karpenter should utilize with the following tag
# (i.e. - at most, only one security group should have this tag in your account)
"karpenter.sh/discovery/${local.cluster_name}" = local.cluster_name
}
# Only need one node to get Karpenter up and running.
# This ensures core services such as VPC CNI, CoreDNS, etc. are up and running
# so that Karpenter can be deployed and start managing compute capacity as required
eks_managed_node_groups = {
initial = {
instance_types = ["m5.large"]
# Not required nor used - avoid tagging two security groups with same tag as well
create_security_group = false
min_size = 1
max_size = 1
desired_size = 1
iam_role_additional_policies = [
"arn:${local.partition}:iam::aws:policy/AmazonSSMManagedInstanceCore", # Required by Karpenter
"arn:${local.partition}:iam::aws:policy/AmazonEKSWorkerNodePolicy",
"arn:${local.partition}:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly", #for access to ECR images
"arn:${local.partition}:iam::aws:policy/CloudWatchAgentServerPolicy"
]
tags = {
# This will tag the launch template created for use by Karpenter
"karpenter.sh/discovery/${local.cluster_name}" = local.cluster_name
}
}
}
}
#The EKS module creates an IAM role for the EKS managed node group nodes. We’ll use that for Karpenter.
#We need to create an instance profile we can reference.
#Karpenter can use this instance profile to launch new EC2 instances and those instances will be able to connect to your cluster.
resource "aws_iam_instance_profile" "karpenter" {
name = "KarpenterNodeInstanceProfile-${local.cluster_name}"
role = module.eks.eks_managed_node_groups["initial"].iam_role_name
}
#Create the KarpenterController IAM Role
#Karpenter requires permissions like launching instances, which means it needs an IAM role that grants it access. The config
#below will create an AWS IAM Role, attach a policy, and authorize the Service Account to assume the role using IRSA. We will
#create the ServiceAccount and connect it to this role during the Helm chart install.
module "karpenter_irsa" {
source = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
version = "5.3.3"
role_name = "karpenter-controller-${local.cluster_name}"
attach_karpenter_controller_policy = true
karpenter_tag_key = "karpenter.sh/discovery/${local.cluster_name}"
karpenter_controller_cluster_id = module.eks.cluster_id
karpenter_controller_node_iam_role_arns = [
module.eks.eks_managed_node_groups["initial"].iam_role_arn
]
oidc_providers = {
ex = {
provider_arn = module.eks.oidc_provider_arn
namespace_service_accounts = ["karpenter:karpenter"]
}
}
}
#Install Karpenter Helm Chart
#Use helm to deploy Karpenter to the cluster. We are going to use the helm_release Terraform resource to do the deploy and pass in the
#cluster details and IAM role Karpenter needs to assume.
provider "helm" {
kubernetes {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
args = ["eks", "get-token", "--cluster-name", local.cluster_name]
}
}
}
resource "helm_release" "karpenter" {
namespace = "karpenter"
create_namespace = true
name = "karpenter"
repository = "https://charts.karpenter.sh"
chart = "karpenter"
version = "v0.16.1"
set {
name = "serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn"
value = module.karpenter_irsa.iam_role_arn
}
set {
name = "clusterName"
value = module.eks.cluster_id
}
set {
name = "clusterEndpoint"
value = module.eks.cluster_endpoint
}
set {
name = "aws.defaultInstanceProfile"
value = aws_iam_instance_profile.karpenter.name
}
}
#Provisioner
#Create a default provisioner using the command below. This provisioner configures instances to connect to your cluster’s endpoint and
#discovers resources like subnets and security groups using the cluster’s name.
#This provisioner will create capacity as long as the sum of all created capacity is less than the specified limit.
provider "kubectl" {
apply_retry_count = 5
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
load_config_file = false
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
}
}
resource "kubectl_manifest" "karpenter_provisioner" {
yaml_body = <<-YAML
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: default
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
limits:
resources:
cpu: 1000
provider:
subnetSelector:
Name: "*private*"
securityGroupSelector:
karpenter.sh/discovery/${module.eks.cluster_id}: ${module.eks.cluster_id}
tags:
karpenter.sh/discovery/${module.eks.cluster_id}: ${module.eks.cluster_id}
ttlSecondsAfterEmpty: 30
YAML
depends_on = [
helm_release.karpenter
]
}
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: inflate
spec:
replicas: 0
selector:
matchLabels:
app: inflate
template:
metadata:
labels:
app: inflate
spec:
terminationGracePeriodSeconds: 0
containers:
- name: inflate
image: public.ecr.aws/eks-distro/kubernetes/pause:3.2
resources:
requests:
cpu: 1
EOF
kubectl scale deployment inflate --replicas 5
kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller
DEBUG controller.provisioning Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.topologySpreadConstraints = {"maxSkew":1,"topologyKey":"topology.kubernetes.io/zone","whenUnsatisfiable":"ScheduleAnyway","labelSelector":{"matchLabels":{"app.kubernetes.io/instance":"karpenter","app.kubernetes.io/name":"karpenter"}}} {"commit": "b157d45", "pod": "karpenter/karpenter-5755bb5b54-rh65t"}
2022-09-10T00:13:13.122Z
ERROR controller.provisioning Could not schedule pod, incompatible with provisioner "default", incompatible requirements, key karpenter.sh/provisioner-name, karpenter.sh/provisioner-name DoesNotExist not in karpenter.sh/provisioner-name In [default] {"commit": "b157d45", "pod": "karpenter/karpenter-5755bb5b54-rh65t"}
I belive this is due to the pod topology defined in the Karpenter deployment here:
https://github.com/aws/karpenter/blob/main/charts/karpenter/values.yaml#L73-L77
, you can read further on what pod topologySpreadConstraints does here:
https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/
If you increase the desired_size to 2 which matches the default deployment replicas above, that should resove the error.

Cannot access container from NodePort using Kubernetes ingress istio

I'm learning Istio so I followed the instruction here
As I'm using terraform so I converted the yaml file to terraform and install istio via Helm
locals {
istio_charts_url = "https://istio-release.storage.googleapis.com/charts"
}
resource "helm_release" "istio-base" {
name = "istio-base"
repository = local.istio_charts_url
chart = "base"
namespace = "istio-system"
create_namespace = true
}
resource "helm_release" "istiod" {
name = "istiod"
repository = local.istio_charts_url
chart = "istiod"
namespace = "istio-system"
depends_on = [helm_release.istio-base]
}
resource "kubernetes_namespace" "istio-ingress" {
metadata {
labels = {
istio-injection = "enabled"
}
name = "istio-ingress"
}
}
resource "helm_release" "istio-ingress" {
repository = local.istio_charts_url
chart = "gateway"
name = "istio-ingress"
namespace = kubernetes_namespace.istio-ingress.id
depends_on = [helm_release.istiod]
set {
name = "service.type"
value = "NodePort"
}
}
and application:
### blog page frontend
resource "kubernetes_service" "blog_page" {
metadata {
name = "blog-page"
namespace = kubernetes_namespace.istio-ingress.id
}
spec {
port {
port = 5000
name = "http"
}
selector = {
app = "blog_page"
}
}
}
resource "kubernetes_deployment" "blog_page_v1" {
metadata {
name = "blog-page-v1"
namespace = kubernetes_namespace.istio-ingress.id
}
spec {
replicas = 1
selector {
match_labels = {
app = "blog_page"
version = "v1"
}
}
template {
metadata {
labels = {
app = "blog_page"
version = "v1"
}
}
spec {
container {
image = "thiv17/blog-service:v1"
name = "blog-page"
image_pull_policy = "Always"
port {
container_port = 5000
}
}
}
}
}
}
resource "kubernetes_ingress" "istio-app" {
metadata {
name = "istio-app"
namespace = kubernetes_namespace.istio-ingress.id
annotations = {
"kubernetes.io/ingress.class" = "istio"
}
}
spec {
rule {
http {
path {
path = "/*"
backend {
service_name = kubernetes_service.blog_page.metadata[0].name
service_port = kubernetes_service.blog_page.spec[0].port[0].port
}
}
}
}
}
}
I expected that I can access via the node port with the Node IP is 10.0.83.140
kubectl describe svc istio-ingress --namespace=istio-ingress
-----
Port: http2 80/TCP
TargetPort: 80/TCP
NodePort: http2 30968/TCP
Endpoints: 10.0.91.237:80
Port: https 443/TCP
kubectl get pods --selector=“app=istio-ingress” --namespace=istio-ingress --output=wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
istio-ingress-5bd77ffbdf-h25vs 1/1 Running 0 24h 10.0.91.237 ip-10-0-83-140.us-west-2.compute.internal <none> <none>
However, when I ssh to this node, even though this node is listening to the port 30968
[ec2-user#ip-10-0-83-140 ~]$ netstat -plan | grep 30968
(No info could be read for "-p": geteuid()=1000 but you should be root.)
tcp 0 0 0.0.0.0:30968 0.0.0.0:* LISTEN -
But I can't access the address http://localhost:30968
* Trying ::1:30968...
* connect to ::1 port 30968 failed: Connection refused
* Failed to connect to localhost port 30968 after 0 ms: Connection refused
* Closing connection 0
curl: (7) Failed to connect to localhost port 30968 after 0 ms: Connection refused
[ec2-user#ip-10-0-83-140 ~]$
I tried to use the public IP also (Changed Security group to public Port 30968) and even changed to use LoadBlancer as well but still did not access it successfully.
Other debug info
kubectl get pods --namespace=istio-ingress
NAME READY STATUS RESTARTS AGE
blog-api-v1-86789596cf-8rh2j 2/2 Running 0 7h58m
blog-page-v1-54d45997f8-q6h6l 2/2 Running 0 7h58m
blog-page-v2-74b6d4b7c9-bgdrm 2/2 Running 0 7h58m
istio-ingress-5bd77ffbdf-h25vs 1/1 Running 0 24h
kubectl describe ingress istio-app --namespace=istio-ingress
Name: istio-app
Labels: <none>
Namespace: istio-ingress
Address:
Default backend: default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
Rules:
Host Path Backends
---- ---- --------
*
/* blog-page:5000 (10.0.81.70:5000,10.0.95.8:5000)
Annotations: kubernetes.io/ingress.class: istio
Events: <none>
Full code:
https://gitlab.com/jimmy-pet-projects/terraform-eks-with-monitoring/-/blob/main/modules/kubernetes/istio.tf
https://gitlab.com/jimmy-pet-projects/terraform-eks-with-monitoring/-/blob/main/modules/kubernetes/istio_app.tf (edit
I found the issue: The name of helm should be istio-ingressgateway. I don't understand its document is using istio-ingress
$ helm install istio-ingress istio/gateway -n istio-ingress --wait
For all those who are facing issues with the Istio template. Here is the working template for the same. Since I faced a couple of issues with that template, I compiled it for my own use case. I hope it's helpful.
provider "helm" {
kubernetes {
config_path = "~/.kube/config"
}
}
provider "kubernetes" {
config_path = "~/.kube/config"
}
locals {
istio_charts_url = "https://istio-release.storage.googleapis.com/charts"
}
resource "kubernetes_namespace" "istio_system" {
metadata {
name = "istio-system"
labels = {
istio-injection = "enabled"
}
}
}
resource "helm_release" "istio-base" {
repository = local.istio_charts_url
chart = "base"
name = "istio-base"
namespace = kubernetes_namespace.istio_system.metadata.0.name
version = ">= 1.12.1"
timeout = 120
cleanup_on_fail = true
force_update = false
}
resource "helm_release" "istiod" {
repository = local.istio_charts_url
chart = "istiod"
name = "istiod"
namespace = kubernetes_namespace.istio_system.metadata.0.name
version = ">= 1.12.1"
timeout = 120
cleanup_on_fail = true
force_update = false
set {
name = "meshConfig.accessLogFile"
value = "/dev/stdout"
}
depends_on = [helm_release.istio-base]
}
resource "helm_release" "istio-ingress" {
repository = local.istio_charts_url
chart = "gateway"
name = "istio-ingress"
namespace = kubernetes_namespace.istio_system.metadata.0.name
version = ">= 1.12.1"
timeout = 500
cleanup_on_fail = true
force_update = false
depends_on = [helm_release.istiod]
}

Terraform: retrieve the nginx ingress controller Load Balancer IP

I'm trying to get the nginx ingress controller load balancer ip in Azure AKS. I figured I would use the kubernetes provider via:
data "kubernetes_service" "nginx_service" {
metadata {
name = "${local.ingress_name}-ingress-nginx-controller"
namespace = local.ingress_ns
}
depends_on = [helm_release.ingress]
}
However, i'm not seeing the IP address, this is what i get back:
nginx_service = [
+ {
+ cluster_ip = "10.0.165.249"
+ external_ips = []
+ external_name = ""
+ external_traffic_policy = "Local"
+ health_check_node_port = 31089
+ load_balancer_ip = ""
+ load_balancer_source_ranges = []
+ port = [
+ {
+ name = "http"
+ node_port = 30784
+ port = 80
+ protocol = "TCP"
+ target_port = "http"
},
+ {
+ name = "https"
+ node_port = 32337
+ port = 443
+ protocol = "TCP"
+ target_port = "https"
},
]
+ publish_not_ready_addresses = false
+ selector = {
+ "app.kubernetes.io/component" = "controller"
+ "app.kubernetes.io/instance" = "nginx-ingress-internal"
+ "app.kubernetes.io/name" = "ingress-nginx"
}
+ session_affinity = "None"
+ type = "LoadBalancer"
},
]
However when I pull down the service via kubectl I can get the IP address via:
kubectl get svc nginx-ingress-internal-ingress-nginx-controller -n nginx-ingress -o json | jq -r '.status.loadBalancer.ingress[].ip'
10.141.100.158
Is this a limitation of kubernetes provider for AKS? If so, what is a workaround other people have used? My end goals is to use the IP to configure the application gateway backend.
I guess I can use local-exec, but that seem hacky. Howerver, this might be my only option at the moment.
Thanks,
Jerry
although i strongly advise against creating resources inside Kubernetes with Terraform, you can do that:
Create a Public IP with Terraform -> Create the ingress-nginx inside Kubernetes with Terraform and pass annotations and loadBalancerIPwith data from your Terraform resources. The final manifest should look like this:
apiVersion: v1
kind: Service
metadata:
annotations:
service.beta.kubernetes.io/azure-load-balancer-resource-group: myResourceGroup
name: ingress-nginx-controller
spec:
loadBalancerIP: <YOUR_STATIC_IP>
type: LoadBalancer
Terraform could look like this:
resource "kubernetes_service" "ingress_nginx" {
metadata {
name = "tingress-nginx-controller"
annotations {
"service.beta.kubernetes.io/azure-load-balancer-resource-group" = "${azurerm_resource_group.YOUR_RG.name}"
}
spec {
selector = {
app = <PLACEHOLDER>
}
port {
port = <PLACEHOLDER>
target_port = <PLACEHOLDER>
}
type = "LoadBalancer"
load_balancer_ip = "${azurerm_public_ip.YOUR_IP.ip_address}"
}
}
Unfortunately, this is for internal ingress and not public facing and the IP is allocated dynamically. We currently dont want to use static ips
This is what I came up with:
module "load_balancer_ip" {
count = local.create_ingress ? 1 : 0
source = "github.com/matti/terraform-shell-resource?ref=v1.5.0"
command = "./scripts/get_load_balancer_ip.sh"
environment = {
KUBECONFIG = base64encode(module.aks.kube_admin_config_raw)
}
depends_on = [local_file.load_balancer_ip_script]
}
resource "local_file" "load_balancer_ip_script" {
count = local.create_ingress ? 1 : 0
filename = "./scripts/get_load_balancer_ip.sh"
content = <<-EOT
#!/bin/bash
echo $KUBECONFIG | base64 --decode > kubeconfig
kubectl get svc -n ${local.ingress_ns} ${local.ingress_name}-ingress-nginx-controller --kubeconfig kubeconfig -o=jsonpath='{.status.loadBalancer.ingress[0].ip}'
rm -f kubeconfig 2>&1 >/dev/null
EOT
}
output nginx_ip {
description = "IP address of the internal nginx controller"
value = local.create_ingress ? module.load_balancer_ip[0].content : null
}

Tiller: dial tcp 127.0.0.1:80: connect: connection refused

From the time I have upgraded the versions of my eks terraform script. I keep getting error after error.
currently I am stuck on this error:
Error: Get http://localhost/api/v1/namespaces/kube-system/serviceaccounts/tiller: dial tcp 127.0.0.1:80: connect: connection refused
Error: Get http://localhost/apis/rbac.authorization.k8s.io/v1/clusterrolebindings/tiller: dial tcp 127.0.0.1:80: connect: connection refused
The script is working fine and I can still use this with old version but I am trying to upgrade the cluster version .
provider.tf
provider "aws" {
region = "${var.region}"
version = "~> 2.0"
assume_role {
role_arn = "arn:aws:iam::${var.target_account_id}:role/terraform"
}
}
provider "kubernetes" {
config_path = ".kube_config.yaml"
version = "~> 1.9"
}
provider "helm" {
service_account = "${kubernetes_service_account.tiller.metadata.0.name}"
namespace = "${kubernetes_service_account.tiller.metadata.0.namespace}"
kubernetes {
config_path = ".kube_config.yaml"
}
}
terraform {
backend "s3" {
}
}
data "terraform_remote_state" "state" {
backend = "s3"
config = {
bucket = "${var.backend_config_bucket}"
region = "${var.backend_config_bucket_region}"
key = "${var.name}/${var.backend_config_tfstate_file_key}" # var.name == CLIENT
role_arn = "${var.backend_config_role_arn}"
skip_region_validation = true
dynamodb_table = "terraform_locks"
encrypt = "true"
}
}
kubernetes.tf
resource "kubernetes_service_account" "tiller" {
#depends_on = ["module.eks"]
metadata {
name = "tiller"
namespace = "kube-system"
}
automount_service_account_token = "true"
}
resource "kubernetes_cluster_role_binding" "tiller" {
depends_on = ["module.eks"]
metadata {
name = "tiller"
}
role_ref {
api_group = "rbac.authorization.k8s.io"
kind = "ClusterRole"
name = "cluster-admin"
}
subject {
kind = "ServiceAccount"
name = "tiller"
api_group = ""
namespace = "kube-system"
}
}
terraform version: 0.12.12
eks module version: 6.0.2
It means your server: entry in your .kube_config.yml is pointing to the wrong port (and perhaps even the wrong protocol, as normal kubernetes communication travels over https and is secured via mutual TLS authentication), or there is no longer a proxy that was listening on localhost:80, or perhaps the --insecure-port used to be 80 and is now 0 (as is strongly recommended)
Regrettably, without more specifics, no one can guess what the correct value was or should be changed to
I am sure that there is a need to set up Kubernetes provider on your terraform configuration.
Something like this:
provider "kubernetes" {
config_path = module.EKS_cluster.kubeconfig_filename
}
This happened to me when I miss-configure credentials for terraform with cluster and when there is no access to the cluster. If you configure your kubectl / what ever you are using to authenticate, this should be solved.

How to Horizontal Autoscaler a Kubernetes Deployment

EDIT:
SOLUTION: I forgot to add target_cpu_utilization_percentage to autoscaler.tf file
I want a web-service in Python (or other language) running on Kubernetes but with auto scaling.
I created a Deployment and a Horizontal Autoscaler but is not working.
I'm using Terraform to configure Kubernetes.
I have this files:
Deployments.tf
resource "kubernetes_deployment" "rui-test" {
metadata {
name = "rui-test"
labels {
app = "rui-test"
}
}
spec {
strategy = {
type = "RollingUpdate"
rolling_update = {
max_unavailable = "26%" # This is not working
}
}
selector = {
match_labels = {
app = "rui-test"
}
}
template = {
metadata = {
labels = {
app = "rui-test"
}
}
spec = {
container {
name = "python-test1"
image = "***************************"
}
}
}
}
}
Autoscaler.tf
resource "kubernetes_horizontal_pod_autoscaler" "test-rui" {
metadata {
name = "test-rui"
}
spec {
max_replicas = 10 # THIS IS NOT WORKING
min_replicas = 3 # THIS IS NOT WORKING
scale_target_ref {
kind = "Deployment"
name = "test-rui" # Name of deployment
}
}
}
Service.tf
resource "kubernetes_service" "rui-test" {
metadata {
name = "rui-test"
labels {
app = "rui-test"
}
}
spec {
selector {
app = "rui-test"
}
type = "LoadBalancer" # Use 'cluster_ip = "None"' or 'type = "LoadBalancer"'
port {
name = "http"
port = 8080
}
}
}
When I run kubectl get hpa I see this:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
rui-test Deployment/rui-test <unknown>/80% 1 3 1 1h
Instead of:
rui-test Deployment/rui-test <unknown>/79% 3 10 1 1h
That is what I want.
But if I run kubectl autoscale deployment rui-test --min=3 --max=10 --cpu-percent=81 I see this:
Error from server (AlreadyExists): horizontalpodautoscalers.autoscaling "rui-test" already exists
In kubernetes appear this
You are missing the metrics server. Kubernetes needs to determine current CPU/Memory usage so that it can autoscale up and down.
One way to know if you have the metrics server installed is to run:
$ kubectl top node
$ kubectl top pod
The Horizontal Pod AutoScaler is dependent on resource limits being configured for your Deployment.
From the documentation:
Please note that if some of the pod’s containers do not have the relevant resource request set, CPU utilization for the pod will not be defined and the autoscaler will not take any action for that metric.