Terraform Cloud, multiple applies, EKS - kubernetes

I'm having the following issue.
I'm trying to deploy an EKS cluster with EKS addons (vpc cni, kubeproxy) and k8s addons (autoscaler, fluentbit). My ADO repo that has the .tf files is connected to TF Cloud, meaning my state is remote. I've recently found out that k8s/terraform won't let you deploy an EKS cluster and its addons in the same run, for some reason (I would get many random errors, at random times). I had to have a separate terraform apply for eks and addons, respectively.
So, I've decided to modularize my code.
Before, my main folder looked like this:
├── Deployment
│ └── main.tf
│ └── eks.tf
│ └── addons.tf
Now, my folder looks like this:
└───Deployment
│ │
│ └─── eks_deploy
│ │ main.tf
│ │ eks.tf
│ │
│ └─── addons_deploy
│ │ main.tf
│ │ addons.tf
And so, I initialize the same remote backend in both. So far, so good. Went ahead with a terraform apply in my eks_deploy folder. Deployed without problems, a clean EKS cluster with no addons. Now, it was time to deploy addons.
And that's where we have a problem.
My main.tf files are the exact same in both folders. And the file looks like this:
terraform {
backend "remote" {}
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 3.66.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = ">= 2.7.1"
}
helm = {
source = "hashicorp/helm"
version = ">= 2.4.1"
}
}
}
data "aws_eks_cluster" "cluster" {
name = module.eks-ssp.eks_cluster_id
}
data "aws_eks_cluster_auth" "cluster" {
name = module.eks-ssp.eks_cluster_id
}
# I am aware you're not supposed to hardcode your creds
provider "aws" {
access_key = "xxx"
secret_key = "xxx"
region = "xxx"
assume_role {
role_arn = "xxx"
}
}
provider "kubernetes" {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
token = data.aws_eks_cluster_auth.cluster.token
}
provider "helm" {
kubernetes {
host = data.aws_eks_cluster.cluster.endpoint
token = data.aws_eks_cluster_auth.cluster.token
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
}
}
The EKS cluster deployed without problems because it had an eks.tf file that contains the module and all needed info to deploy a cluster. However, my addon deployment throws the following errors:
╷
│ Error: Reference to undeclared module
│
│ on addons.tf line 60, in module "eks-ssp-kubernetes-addons":
│ 60: depends_on = [module.eks-ssp.self_managed_node_groups]
│
│ No module call named "eks-ssp" is declared in the root module.
╵
╷
│ Error: Reference to undeclared module
│
│ on main.tf line 22, in data "aws_eks_cluster" "cluster":
│ 22: name = module.eks-ssp.eks_cluster_id
│
│ No module call named "eks-ssp" is declared in the root module.
╵
╷
│ Error: Reference to undeclared module
│
│ on main.tf line 26, in data "aws_eks_cluster_auth" "cluster":
│ 26: name = module.eks-ssp.eks_cluster_id
│
│ No module call named "eks-ssp" is declared in the root module.
This is completely understandable, since the EKS cluster DOES NOT exist in the addon deployment, thus the addon deployment has no clue where to actually deploy those addons.
So my question is... how do I perform 2 different applies, for what's essentially the same resource (EKS), with each deployment being fully aware of each other (working as if they were in the same file and deployment? People mentioned "terragrunt", but I still don't understand how I could use it in my case, so if that's the solution you propose as well, please give a description of its way of use. There is also the following question - how would I connect the same repo, with 2 different folders/deployments, having separate applies? Does TF cloud even allow such a thing? At this point, I'm starting to think that a completely separate workspace, and hardcoded EKS values inside addons.tf is the only way. Thank you.

Related

Issue with Terraform accessing list value of a key in YAML file

I am deploying azure service bus using terraform and yaml conf file. I am creating
azure service bus name pace, network rules for the service bus and service authorization rule for the name space using terraform.but,i want to define the multiple topics and multiple subscriptions under the topics in a yaml file which will be accessed by terraform as parameters during creating the resources "topic" and "subscription". I have defined the Multiple subscriptions as list value of the topic. The topics are created successfully but the Multiple subscriptions are not. The error , yaml, and terraform conf are given below.
Error: Incorrect attribute value type
│
│ on main.tf line 215, in resource "azurerm_servicebus_subscription" "subscription":
│ 215: name = each.value.servicebus_subscription
│ ├────────────────
│ │ each.value.servicebus_subscription is tuple with 2 elements
│
│ Inappropriate value for attribute "name": string required.
con.yaml
#-------
servicebus:
- servicebus_topic: tesTopic1
#enable_partitioning: "true"
servicebus_subscription: ['test-service1', 'test-service1']
- servicebus_topic: testTopic2
servicebus_subscription:['test-db1', 'test-service2']
pub_sub.tf
resource "azurerm_servicebus_subscription" "subscription" {
for_each = { for subscriptions in local.service_bus_conf : subscriptions.servicebus_topic=> subscriptions}
depends_on = [azurerm_servicebus_topic.topic]
name = each.value.servicebus_subscription
topic_id = data.azurerm_servicebus_topic.topic[each.value.servicebus_topic].id
}
````

are AWS EFS Mount Targets supported in AWS Local Zones?

I am trying to create an EFS mount target in us-east-1-atl-1a AWS Local Zone using Terraform, but I received following error. I attempted to create it manually using UI, but I don't see an option to select us-east-1-atl-1a as an AZ(See screenshot). Does anyone know if this AWS Local Zone supports EFS mount targets? AWS Local Zone info page doesn't mention EFS at all.
Terraform Error:
│ Error: UnsupportedAvailabilityZone: Mount targets are not supported in subnet's availability zone.
│ {
│ RespMetadata: {
│ StatusCode: 400,
│ RequestID: "23192f37-77e6-421b-b623-4a2b6dfb6217"
│ },
│ ErrorCode: "UnsupportedAvailabilityZone",
│ Message_: "Mount targets are not supported in subnet's availability zone."
│ }
│
│ with aws_efs_mount_target.efs-mounts[0],
│ on efs.tf line 7, in resource "aws_efs_mount_target" "efs-mounts":
│ 7: resource "aws_efs_mount_target" "efs-mounts" {
│
╵
╷
│ Error: creating EKS Cluster (d115): UnsupportedAvailabilityZoneException: Cannot create cluster 'd115' because us-east-1-atl-1a, the targeted availability zone, does not currently have sufficient capacity to support the cluster. Retry and choose from these availability zones: us-east-1a, us-east-1b, us-east-1c, us-east-1d, us-east-1f
│ {
│ RespMetadata: {
│ StatusCode: 400,
│ RequestID: "51476966-09d3-4976-b8a1-f381b9c29c17"
│ },
│ ClusterName: "d115",
│ Message_: "Cannot create cluster 'd115' because us-east-1-atl-1a, the targeted availability zone, does not currently have sufficient capacity to support the cluster. Retry and choose from these availability zones: us-east-1a, us-east-1b, us-east-1c, us-east-1d, us-east-1f",
│ ValidZones: [
│ "us-east-1a",
│ "us-east-1b",
│ "us-east-1c",
│ "us-east-1d",
│ "us-east-1f"
│ ]
│ }
EFS Mount Targets screenshot
EFS is not currently listed as a service available in Local Zones. You can see the list of services here - https://aws.amazon.com/about-aws/global-infrastructure/localzones/features/
EBS and FSx are the only storage options currently available.
This is not supported and only standard availablity zone are supported as the error indicated.
"us-east-1a",
│ "us-east-1b",
│ "us-east-1c",
│ "us-east-1d",
│ "us-east-1f"

HTTP to HTTPS redirects on Kubernetes v1.22 running in GCP

I have a GCP cluster that was running on v1.21 and I have upgraded it to v1.22. There were some deprecated API calls that I was doing and I have managed to get rid of most of them but one.
$ kubent
2:02PM INF >>> Kube No Trouble `kubent` <<<
__________________________________________________________________________________________
>>> Deprecated APIs removed in 1.22 <<<
------------------------------------------------------------------------------------------
KIND NAMESPACE NAME API_VERSION REPLACE_WITH (SINCE)
Ingress elastic kibana-kibana networking.k8s.io/v1beta1 networking.k8s.io/v1 (1.19.0)
__________________________________________________________________________________________
I am pretty sure I have found where this is defined in my terraform scripts and I have tried to upgrade it following the release notes but to no avail.
resource "kubernetes_manifest" "nginx_frontend_config" {
manifest = {
"apiVersion" = "networking.gke.io/v1beta1"
"kind" = "FrontendConfig"
"metadata" = {
"name" = "nginx-frontend-config"
"namespace" = kubernetes_namespace.nginx.metadata[0].name
}
"spec" = {
"redirectToHttps" = {
"enabled" = true
"responseCodeName" = "FOUND"
}
"sslPolicy" = google_compute_ssl_policy.default.name
}
}
}
I have tried to upgrade the apiVersion from .../v1beta1 to .../v1 but I get the following error when running the terraform scripts:
╷
│ Error: Failed to determine GroupVersionResource for manifest
│
│ with kubernetes_manifest.nginx_frontend_config,
│ on nginx.tf line 83, in resource "kubernetes_manifest" "nginx_frontend_config":
│ 83: resource "kubernetes_manifest" "nginx_frontend_config" {
│
│ cannot select exact GV from REST mapper
╵
I have looked everywhere and I couldn't find a resource on how to define a Http -> Https redirect in Kubernetes v1.22. The official guides on GCP are referencing only the v1beta1 version. And in the Ingress Migration Guide it states to use the v1 version but that doesn't work for me.
P.S. I have also tried networking.k8s.io/v1 but it comes back with no matches for kind "FrontendConfig" in group "networking.k8s.io" when I run the terraform scripts.
How do I define a FrontendConfig for a redirect post v1.22?

Terraform Kubernetes metadata name from variable

I'm trying to inject a variable in the terraform kubernetes_service -> metadata -> name field. I'm getting the following error
│ Error: metadata.0.name a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
│
│ with module.collector.kubernetes_deployment.dp_collector,
│ on modules/collector/main.tf line 3, in resource "kubernetes_deployment" "dp_collector":
│ 3: name = var.name
Is there any way to do that, from the error description I can't do it I guess.
var.name = "app_collector"
Why I want to do this? I have a couple of microservices and deploying is same except ports and names, hence I want to abstract the service in a module.

Airflow scheduler can not connect to Kubernetes service api

I am trying to setup airflow with Kubernetes executor and on scheduler container startup it hangs for a while and then I get https timeout error as follows. The ip address in message is correct and inside container I can run curl kubernetes:443 or curl 10.96.0.1:443 or nc -zv 10.96.0.1 443 so I assume there is no firewall or so blocking access.
I am using local kubernetes as well as aws EKS but same error, I can see that ip changes in different clusters.
I have looked at google to find a solution but did not see similar cases.
│ File "/usr/local/lib/python3.6/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 335, in run │
│ self.worker_uuid, self.kube_config) │
│ File "/usr/local/lib/python3.6/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 359, in _run │
│ **kwargs): │
│ File "/usr/local/lib/python3.6/site-packages/kubernetes/watch/watch.py", line 144, in stream │
│ for line in iter_resp_lines(resp): │
│ File "/usr/local/lib/python3.6/site-packages/kubernetes/watch/watch.py", line 48, in iter_resp_lines │
│ for seg in resp.read_chunked(decode_content=False): │
│ File "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 781, in read_chunked │
│ self._original_response.close() │
│ File "/usr/local/lib/python3.6/contextlib.py", line 99, in __exit__ │
│ self.gen.throw(type, value, traceback) │
│ File "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 430, in _error_catcher │
│ raise ReadTimeoutError(self._pool, None, "Read timed out.") │
│ urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='10.96.0.1', port=443): Read timed out.
update: I found my problem, but no solution yet.
https://github.com/kubernetes-client/python/issues/990
There is an option to set the value via the ENV variable. In your charts/airflow.yaml file, you can set the variable as follows and that should solve your problem,
AIRFLOW__KUBERNETES__KUBE_CLIENT_REQUEST_ARGS: {"_request_timeout" : [50, 50]}
PR Reference: https://github.com/apache/airflow/pull/6643
Problem Discussion: https://issues.apache.org/jira/browse/AIRFLOW-6040
airflow.yaml full code
airflow:
image:
repository: airflow-docker-local
tag: 1
executor: Kubernetes
service:
type: LoadBalancer
config:
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://postgres:airflow#airflow-postgresql:5432/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://postgres:airflow#airflow-postgresql:5432/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:airflow#airflow-redis-master:6379/0
AIRFLOW__CORE__REMOTE_LOGGING: True
AIRFLOW__CORE__REMOTE_LOG_CONN_ID: my_s3_connection
AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER: s3://xxx-airflow/logs
AIRFLOW__WEBSERVER__LOG_FETCH_TIMEOUT_SEC: 25
AIRFLOW__CORE__LOAD_EXAMPLES: True
AIRFLOW__WEBSERVER__EXPOSE_CONFIG: True
AIRFLOW__CORE__FERNET_KEY: -xyz=
AIRFLOW__KUBERNETES__WORKER_CONTAINER_REPOSITORY: airflow-docker-local
AIRFLOW__KUBERNETES__WORKER_CONTAINER_TAG: 1
AIRFLOW__KUBERNETES__WORKER_CONTAINER_IMAGE_PULL_POLICY: Never
AIRFLOW__KUBERNETES__WORKER_SERVICE_ACCOUNT_NAME: airflow
AIRFLOW__KUBERNETES__DAGS_VOLUME_CLAIM: airflow
AIRFLOW__KUBERNETES__NAMESPACE: airflow
AIRFLOW__KUBERNETES__KUBE_CLIENT_REQUEST_ARGS: {"_request_timeout" : [50, 50]}
persistence:
enabled: true
existingClaim: ''
workers:
enabled: true
postgresql:
enabled: true
redis:
enabled: true