Please have some grace because I just started learning AWS. I am trying to launch a PostgreSQL DB from Terraform and I keep running into this subnet error
Error: error creating RDS cluster: DBSubnetGroupNotFoundFault: DB subnet group 'cse-cr' does not exist.
│ status code: 404, request id: 6461e755-8118-41ee-8baf-5678e28c39aa
│
│ with aws_rds_cluster.secondary,
│ on maindb.tf line 67, in resource "aws_rds_cluster" "secondary":
│ 67: resource "aws_rds_cluster" "secondary" {
I tried adding a second resource section specifically for the secondary cluster/instance but that also did not work. What am I doing wrong...?
Here is my code:
provider "aws" {
alias = "primary"
region = "us-east-2"
}
provider "aws" {
alias = "secondary"
region = "us-east-1"
}
resource "aws_rds_global_cluster" "example" {
global_cluster_identifier = "global-test"
engine = "aurora-postgresql"
engine_version = "13.4"
database_name = "example_db"
}
variable "database_name" {
description = "What should RDS name the initial db?"
default = "variableWithSomeDefault"
}
data "aws_subnet" "subnet1" {
id = "subnet-numhere"
}
data "aws_subnet" "subnet2" {
id = "subnet-numhere"
}
resource "aws_db_subnet_group" "default" {
name = "cse-cr"
description = "Private subnets for RDS instance"
subnet_ids = [data.aws_subnet.subnet1.id, data.aws_subnet.subnet2.id]
}
resource "aws_rds_cluster" "primary" {
provider = aws.primary
engine = "aurora-postgresql"
engine_version = "13.4"
cluster_identifier = "test-primary-cluster"
master_username = "username"
master_password = "somepass123"
database_name = var.database_name
global_cluster_identifier = aws_rds_global_cluster.example.global_cluster_identifier
db_subnet_group_name = "cse-cr"
skip_final_snapshot = true
}
resource "aws_rds_cluster_instance" "primary" {
provider = aws.primary
engine = "aurora-postgresql"
engine_version = "13.4"
identifier = "primaryinstancetest"
cluster_identifier = "${aws_rds_cluster.primary.cluster_identifier}"
instance_class = "db.r5.large"
db_subnet_group_name = "cse-cr"
}
resource "aws_rds_cluster" "secondary" {
provider = aws.secondary
engine = "aurora-postgresql"
engine_version = "13.4"
cluster_identifier = "test-secondary-cluster"
global_cluster_identifier = aws_rds_global_cluster.example.global_cluster_identifier
skip_final_snapshot = true
db_subnet_group_name = "cse-cr"
depends_on = [
aws_rds_cluster_instance.primary
]
}
resource "aws_rds_cluster_instance" "secondary" {
provider = aws.secondary
engine = "aurora-postgresql"
engine_version = "13.4"
identifier = "secondaryinstancetest"
cluster_identifier = "${aws_rds_cluster.secondary.cluster_identifier}"
instance_class = "db.r5.large"
db_subnet_group_name = "cse-cr"
}
Terraform tries to create the RDS cluster (aws_rds_cluster.secondary) before the DB subnet group was (aws_db_subnet_group.default) was provisioned. You have to have a dependency between them in order for Terraform to be able to detect the provisioning order.
To solve this, I recommend referencing the aws_db_subnet_group.default.name whenever it asks for the subnet group name. For example:
resource "aws_rds_cluster_instance" "primary" {
provider = aws.primary
engine = "aurora-postgresql"
engine_version = "13.4"
identifier = "primaryinstancetest"
cluster_identifier = aws_rds_cluster.primary.cluster_identifier
instance_class = "db.r5.large"
db_subnet_group_name = aws_db_subnet_group.default.name # there is no need to hardcode "cse-cr" here
}
Or another example in the secondary cluster:
resource "aws_rds_cluster_instance" "secondary" {
provider = aws.primary
engine = "aurora-postgresql"
engine_version = "13.4"
identifier = "secondaryinstancetest"
cluster_identifier = aws_rds_cluster.secondary.cluster_identifier
instance_class = "db.r5.large"
db_subnet_group_name = aws_rds_cluster.primary.cluster_identifier
}
Please note, I removed the secondary provider from the second cluster. You can not have a subnet (or a VPC) which spans to 2 different regions!
Related
I am unable to create an RDS due to failure in creating a subnet. I have different modules that I use to create an AWS infrastructure.
The main ones that i am having trouble with is RDS an VPC, where in the first one i create the database:
rds/main.tf
resource "aws_db_parameter_group" "education" {
name = "education"
family = "postgres14"
parameter {
name = "log_connections"
value = "1"
}
}
resource "aws_db_instance" "education" {
identifier = "education"
instance_class = "db.t3.micro"
allocated_storage = 5
engine = "postgres"
engine_version = "14.1"
username = "edu"
password = var.db_password
db_subnet_group_name = var.database_subnets
vpc_security_group_ids = var.rds_service_security_groups
parameter_group_name = aws_db_parameter_group.education.name
publicly_accessible = false
skip_final_snapshot = true
}
rds/variables.tf
variable "db_username" {
description = "RDS root username"
default = "someusername"
}
variable "db_password" {
description = "RDS root user password"
sensitive = true
}
variable "vpc_id" {
description = "VPC ID"
}
variable "rds_service_security_groups" {
description = "Comma separated list of security groups"
}
variable "database_subnets" {
description = "List of private subnets"
}
And the latter where i create the subnets and etc.
vpc/main.tf
resource "aws_subnet" "private" {
vpc_id = aws_vpc.main.id
cidr_block = element(var.private_subnets, count.index)
availability_zone = element(var.availability_zones, count.index)
count = length(var.private_subnets)
tags = {
Name = "${var.name}-private-subnet-${var.environment}-${format("%03d", count.index+1)}"
Environment = var.environment
}
}
resource "aws_subnet" "public" {
vpc_id = aws_vpc.main.id
cidr_block = element(var.public_subnets, count.index)
availability_zone = element(var.availability_zones, count.index)
count = length(var.public_subnets)
map_public_ip_on_launch = true
tags = {
Name = "${var.name}-public-subnet-${var.environment}-${format("%03d", count.index+1)}"
Environment = var.environment
}
}
resource "aws_subnet" "database" {
vpc_id = aws_vpc.main.id
cidr_block = element(var.database_subnets, count.index)
availability_zone = element(var.availability_zones, count.index)
count = length(var.database_subnets)
tags = {
Name = "Education"
Environment = var.environment
}
}
vpc/variables.tf
variable "name" {
description = "the name of the stack"
}
variable "environment" {
description = "the name of the environment "
}
variable "cidr" {
description = "The CIDR block for the VPC."
}
variable "public_subnets" {
description = "List of public subnets"
}
variable "private_subnets" {
description = "List of private subnets"
}
variable "database_subnets" {
description = "Database subnetes"
}
variable "availability_zones" {
description = "List of availability zones"
}
Then in the root directory i have a main.tf file where i create everything. In there i call the rds module
main.tf
module "rds" {
source = "./rds"
vpc_id = module.vpc.id
database_subnets = module.vpc.database_subnets
rds_service_security_groups = [module.security_groups.rds]
db_password = var.db_password
}
The error that i keep getting is this
Error: Incorrect attribute value type
│
│ on rds\\main.tf line 19, in resource "aws_db_instance" "education":
│ 19: db_subnet_group_name = var.database_subnets
│ ├────────────────
│ │ var.database_subnets is tuple with 2 elements
│
│ Inappropriate value for attribute "db_subnet_group_name": string required.
Any idea how i can fix it?
You are trying to pass a list of DB Subnets into a parameter that takes a DB Subnet Group name.
You need to modify your RDS module to create a DB Subnet Group with the given subnet IDs, and then pass that group name to the instance:
resource "aws_db_subnet_group" "education" {
name = "education"
subnet_ids = var.database_subnets
}
resource "aws_db_instance" "education" {
identifier = "education"
db_subnet_group_name = aws_db_subnet_group.education.name
...
}
I am blocked while upgrading AWS RDS PostgreSQL from 10.18 to 13.4 using terraform. Below is the TF code used and errors. Pl suggest.
resource "aws_rds_cluster_instance" "aws_rds_cluster_instance" {
count = 3
identifier = "aws_rds_cluster_instance-${count.index}"
instance_class = "db.r4.large"
availability_zone = data.aws_availability_zones.available.names[count.index]
engine = "aurora-postgresql"
engine_version = aws_rds_cluster.aws_rds_cluster.engine_version
cluster_identifier = aws_rds_cluster.aws_rds_cluster.id
monitoring_interval = "60"
monitoring_role_arn = "arn:aws:iam::12345678910:role/role"
performance_insights_enabled = "true"
preferred_maintenance_window = "mon:12:20-mon:12:50"
promotion_tier = "1"
publicly_accessible = "false"
db_subnet_group_name = aws_db_subnet_group.db.name
tags = merge(map(
"system", "local.system_name",
"business_unit", "local.business_unit",
), local.base_tags)
lifecycle {
ignore_changes = [engine_version]
prevent_destroy = true
}
}
resource "aws_rds_cluster" "aws_rds_cluster" {
engine = "aurora-postgresql"
engine_version = "10.18"
cluster_identifier = "aws_rds_cluster_instance-cluster"
availability_zones = data.aws_availability_zones.available.names
database_name = "aws_rds_cluster_instance"
deletion_protection = true
master_username = var.db_username
master_password = var.db_password
backup_retention_period = 7
preferred_backup_window = "07:00-08:00"
preferred_maintenance_window = "mon:23:00-mon:23:30"
port = 5432
skip_final_snapshot = "true"
storage_encrypted = "true"
iam_database_authentication_enabled = "true"
kms_key_id = aws_kms_key.key.arn
vpc_security_group_ids = [aws_security_group.db.id]
db_subnet_group_name = aws_db_subnet_group.db.name
db_cluster_parameter_group_name = aws_rds_cluster_parameter_group.param_group.name
tags = merge(map(
"system", "local.system_name",
"business_unit", "local.business_unit",
), local.base_tags)
lifecycle {
ignore_changes = [engine_version]
prevent_destroy = true
}
}
I am getting below error after changing engine version and parameter group values.
Error: Failed to modify RDS Cluster (aws_rds_cluster_instance): InvalidParameterCombination: The AllowMajorVersionUpgrade flag must be present when upgrading to a new major version.
status code: 400, request id: 648f75a8-abcd-49be-z480-056e71e86e6c
on main.tf line 62, in resource "aws_rds_cluster" "aws_rds_cluster":
62: resource "aws_rds_cluster" "aws_rds_cluster" {
When we have added allow_major_version_upgrade = true under resource aws_rds_cluster_instance/aws_rds_cluster I have got below error
Error: Unsupported argument
on main.tf line 49, in resource "aws_rds_cluster_instance" "aws_rds_cluster_instance":
49: allow_major_version_upgrade = true
An argument named "allow_major_version_upgrade" is not expected here.
Error: Unsupported argument
on main.tf line 63, in resource "aws_rds_cluster" "aws_rds_cluster":
63: allow_major_version_upgrade = true
An argument named "allow_major_version_upgrade" is not expected here.
The allow_major_version_upgrade argument belongs to the schema in the aws_rds_cluster resource. Since the error thrown states it is unsupported, that means your version of the provider is too old to support that argument.
First upgrade the AWS provider to a minimum of 3.8.0, which is the first version supporting that argument.
Then, you can supply that argument and value to the resource as expected:
resource "aws_rds_cluster" "aws_rds_cluster" {
allow_major_version_upgrade = true
...
}
I'm trying to provision the postgres in the aws also create the database and roles sequentially using the terraform.
But getting the below exception and i could not able to create the role/db.
terraform {
required_providers {
# postgresql = {
# source = "cyrilgdn/postgresql"
# version = "1.15.0"
# }
postgresql = {
source = "terraform-providers/postgresql"
version = ">=1.7.2"
}
helm = {
source = "hashicorp/helm"
version = "2.4.1"
}
aws = {
source = "hashicorp/aws"
version = "4.0.0"
}
}
}
resource "aws_db_instance" "database" {
identifier = "dev-test"
allocated_storage = 100
storage_type = "gp2"
engine = "postgres"
engine_version = "13.4"
port = 5432
instance_class = "db.t3.micro"
username = "postgres"
performance_insights_enabled = true
password = "postgres$123"
db_subnet_group_name = "some_name"
vpc_security_group_ids = ["sg_name"]
parameter_group_name = "default.postgres13"
publicly_accessible = true
delete_automated_backups = false
storage_encrypted = true
tags = {
Name = "dev-test"
}
skip_final_snapshot = true
}
#To create the "raw" database
provider "postgresql" {
version = ">=1.4.0"
database = "raw"
host = aws_db_instance.database.address
port = aws_db_instance.database.port
username = aws_db_instance.database.username
password = aws_db_instance.database.password
sslmode = "require"
connect_timeout = 15
superuser = false
expected_version = aws_db_instance.database.engine_version
}
#creation of the role
resource "postgresql_role" "application_role" {
provider = postgresql
name = "test"
login = true
password = "test$123"
encrypted_password = true
create_database = false
depends_on = [aws_db_instance.database]
}
Error -
Error: dial tcp 18.221.183.66:5432: i/o timeout
│
│ with postgresql_role.application_role,
│ on main.tf line 79, in resource "postgresql_role" "application_role":
│ 79: resource "postgresql_role" "application_role" {
│
╵
I noticed few people are saying to include the expected_version attribute in the latest version should work.
Although including the expected version attribute still the issue persist.
I need to provision the postgres in the aws, create the db and roles.
What could be issue with my script ?
As per documentation [1], you are missing the scheme in the postgresql provider:
provider "postgresql" {
scheme = "awspostgres"
database = "raw"
host = aws_db_instance.database.address
port = aws_db_instance.database.port
username = aws_db_instance.database.username
password = aws_db_instance.database.password
sslmode = "require"
connect_timeout = 15
superuser = false
expected_version = aws_db_instance.database.engine_version
}
Additionally, I am not sure if you can use database = raw or it has to be database = "postgres", which is the default value so it does not have to be specified.
One other note: I do not think you need to specify the provider block in every resource. You just define it once in the required_providers block (like you did for aws provider) and then anything related to that provider will assume using the provider defined. In other words, you should remove the version = ">=1.4.0" from the provider "postgres" and provider = postgresql from the resource "postgresql_role" "application_role" and the code should still work.
[1] https://registry.terraform.io/providers/cyrilgdn/postgresql/latest/docs#aws
I'm trying to deploy an EKS self managed with Terraform. While I can deploy the cluster with addons, vpc, subnet and all other resources, it always fails at helm:
Error: Kubernetes cluster unreachable: the server has asked for the client to provide credentials
with module.eks-ssp-kubernetes-addons.module.ingress_nginx[0].helm_release.nginx[0]
on .terraform/modules/eks-ssp-kubernetes-addons/modules/kubernetes-addons/ingress-nginx/main.tf line 19, in resource "helm_release" "nginx":
resource "helm_release" "nginx" {
This error repeats for metrics_server, lb_ingress, argocd, but cluster-autoscaler throws:
Warning: Helm release "cluster-autoscaler" was created but has a failed status.
with module.eks-ssp-kubernetes-addons.module.cluster_autoscaler[0].helm_release.cluster_autoscaler[0]
on .terraform/modules/eks-ssp-kubernetes-addons/modules/kubernetes-addons/cluster-autoscaler/main.tf line 1, in resource "helm_release" "cluster_autoscaler":
resource "helm_release" "cluster_autoscaler" {
My main.tf looks like this:
terraform {
backend "remote" {}
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 3.66.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = ">= 2.7.1"
}
helm = {
source = "hashicorp/helm"
version = ">= 2.4.1"
}
}
}
data "aws_eks_cluster" "cluster" {
name = module.eks-ssp.eks_cluster_id
}
data "aws_eks_cluster_auth" "cluster" {
name = module.eks-ssp.eks_cluster_id
}
provider "aws" {
access_key = "xxx"
secret_key = "xxx"
region = "xxx"
assume_role {
role_arn = "xxx"
}
}
provider "kubernetes" {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
token = data.aws_eks_cluster_auth.cluster.token
}
provider "helm" {
kubernetes {
host = data.aws_eks_cluster.cluster.endpoint
token = data.aws_eks_cluster_auth.cluster.token
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
}
}
My eks.tf looks like this:
module "eks-ssp" {
source = "github.com/aws-samples/aws-eks-accelerator-for-terraform"
# EKS CLUSTER
tenant = "DevOpsLabs2b"
environment = "dev-test"
zone = ""
terraform_version = "Terraform v1.1.4"
# EKS Cluster VPC and Subnet mandatory config
vpc_id = "xxx"
private_subnet_ids = ["xxx","xxx", "xxx", "xxx"]
# EKS CONTROL PLANE VARIABLES
create_eks = true
kubernetes_version = "1.19"
# EKS SELF MANAGED NODE GROUPS
self_managed_node_groups = {
self_mg = {
node_group_name = "DevOpsLabs2b"
subnet_ids = ["xxx","xxx", "xxx", "xxx"]
create_launch_template = true
launch_template_os = "bottlerocket" # amazonlinux2eks or bottlerocket or windows
custom_ami_id = "xxx"
public_ip = true # Enable only for public subnets
pre_userdata = <<-EOT
yum install -y amazon-ssm-agent \
systemctl enable amazon-ssm-agent && systemctl start amazon-ssm-agent \
EOT
disk_size = 10
instance_type = "t2.small"
desired_size = 2
max_size = 10
min_size = 0
capacity_type = "" # Optional Use this only for SPOT capacity as capacity_type = "spot"
k8s_labels = {
Environment = "dev-test"
Zone = ""
WorkerType = "SELF_MANAGED_ON_DEMAND"
}
additional_tags = {
ExtraTag = "t2x-on-demand"
Name = "t2x-on-demand"
subnet_type = "public"
}
create_worker_security_group = false # Creates a dedicated sec group for this Node Group
},
}
}
enable_amazon_eks_vpc_cni = true
amazon_eks_vpc_cni_config = {
addon_name = "vpc-cni"
addon_version = "v1.7.5-eksbuild.2"
service_account = "aws-node"
resolve_conflicts = "OVERWRITE"
namespace = "kube-system"
additional_iam_policies = []
service_account_role_arn = ""
tags = {}
}
enable_amazon_eks_kube_proxy = true
amazon_eks_kube_proxy_config = {
addon_name = "kube-proxy"
addon_version = "v1.19.8-eksbuild.1"
service_account = "kube-proxy"
resolve_conflicts = "OVERWRITE"
namespace = "kube-system"
additional_iam_policies = []
service_account_role_arn = ""
tags = {}
}
#K8s Add-ons
enable_aws_load_balancer_controller = true
enable_metrics_server = true
enable_cluster_autoscaler = true
enable_aws_for_fluentbit = true
enable_argocd = true
enable_ingress_nginx = true
depends_on = [module.eks-ssp.self_managed_node_groups]
}
OP has confirmed in the comment that the problem was resolved:
Of course. I think I found the issue. Doing "kubectl get svc" throws: "An error occurred (AccessDenied) when calling the AssumeRole operation: User: arn:aws:iam::xxx:user/terraform_deploy is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::xxx:user/terraform_deploy"
Solved it by using my actual role, that's crazy. No idea why it was calling itself.
For similar problem look also this issue.
I solved this error by adding dependencies in the helm installations.
The depends_on will wait for the step to successfully complete and then helm module runs.
module "nginx-ingress" {
depends_on = [module.eks, module.aws-load-balancer-controller]
source = "terraform-module/release/helm"
...}
module "aws-load-balancer-controller" {
depends_on = [module.eks]
source = "terraform-module/release/helm"
...}
module "helm_autoscaler" {
depends_on = [module.eks]
source = "terraform-module/release/helm"
...}
I have a regional cluster for redundancy. In this cluster I want to create a node-pool in just 1 zone in this region. Is this configuration possible? reason I trying this is, I want to run service like RabbitMQ in just 1 zone to avoid split, and my application services running on all zones in the region for redundancy.
I am using terraform to create the cluster and node pools, below is my config for creating region cluster and zone node pool
resource "google_container_cluster" "regional_cluster" {
provider = google-beta
project = "my-project"
name = "my-cluster"
location = "us-central1"
node_locations = ["us-central1-a", "us-central1-b", "us-central1-c"]
master_auth {
username = ""
password = ""
client_certificate_config {
issue_client_certificate = false
}
}
}
resource "google_container_node_pool" "one_zone" {
project = google_container_cluster.regional_cluster.project
name = "zone-pool"
location = "us-central1-b"
cluster = google_container_cluster.regional_cluster.name
node_config {
machine_type = var.machine_type
image_type = var.image_type
disk_size_gb = 100
disk_type = "pd-standard"
}
}
This throws an error message
error creating NodePool: googleapi: Error 404: Not found: projects/my-project/zones/us-central1-b/clusters/my-cluster., notFound
Found out that location in google_container_node_pool should specify cluster master's region/zone. To actually specify the node-pool location node_locations should be used. Below is the config that worked
resource "google_container_cluster" "regional_cluster" {
provider = google-beta
project = "my-project"
name = "my-cluster"
location = "us-central1"
node_locations = ["us-central1-a", "us-central1-b", "us-central1-c"]
master_auth {
username = ""
password = ""
client_certificate_config {
issue_client_certificate = false
}
}
}
resource "google_container_node_pool" "one_zone" {
project = google_container_cluster.regional_cluster.project
name = "zone-pool"
location = google_container_cluster.regional_cluster.location
node_locations = ["us-central1-b"]
cluster = google_container_cluster.regional_cluster.name
node_config {
machine_type = var.machine_type
image_type = var.image_type
disk_size_gb = 100
disk_type = "pd-standard"
}
}