Issues with Subnets in my secondary instance - postgresql

Please have some grace because I just started learning AWS. I am trying to launch a PostgreSQL DB from Terraform and I keep running into this subnet error
Error: error creating RDS cluster: DBSubnetGroupNotFoundFault: DB subnet group 'cse-cr' does not exist.
│ status code: 404, request id: 6461e755-8118-41ee-8baf-5678e28c39aa
│
│ with aws_rds_cluster.secondary,
│ on maindb.tf line 67, in resource "aws_rds_cluster" "secondary":
│ 67: resource "aws_rds_cluster" "secondary" {
I tried adding a second resource section specifically for the secondary cluster/instance but that also did not work. What am I doing wrong...?
Here is my code:
provider "aws" {
alias = "primary"
region = "us-east-2"
}
provider "aws" {
alias = "secondary"
region = "us-east-1"
}
resource "aws_rds_global_cluster" "example" {
global_cluster_identifier = "global-test"
engine = "aurora-postgresql"
engine_version = "13.4"
database_name = "example_db"
}
variable "database_name" {
description = "What should RDS name the initial db?"
default = "variableWithSomeDefault"
}
data "aws_subnet" "subnet1" {
id = "subnet-numhere"
}
data "aws_subnet" "subnet2" {
id = "subnet-numhere"
}
resource "aws_db_subnet_group" "default" {
name = "cse-cr"
description = "Private subnets for RDS instance"
subnet_ids = [data.aws_subnet.subnet1.id, data.aws_subnet.subnet2.id]
}
resource "aws_rds_cluster" "primary" {
provider = aws.primary
engine = "aurora-postgresql"
engine_version = "13.4"
cluster_identifier = "test-primary-cluster"
master_username = "username"
master_password = "somepass123"
database_name = var.database_name
global_cluster_identifier = aws_rds_global_cluster.example.global_cluster_identifier
db_subnet_group_name = "cse-cr"
skip_final_snapshot = true
}
resource "aws_rds_cluster_instance" "primary" {
provider = aws.primary
engine = "aurora-postgresql"
engine_version = "13.4"
identifier = "primaryinstancetest"
cluster_identifier = "${aws_rds_cluster.primary.cluster_identifier}"
instance_class = "db.r5.large"
db_subnet_group_name = "cse-cr"
}
resource "aws_rds_cluster" "secondary" {
provider = aws.secondary
engine = "aurora-postgresql"
engine_version = "13.4"
cluster_identifier = "test-secondary-cluster"
global_cluster_identifier = aws_rds_global_cluster.example.global_cluster_identifier
skip_final_snapshot = true
db_subnet_group_name = "cse-cr"
depends_on = [
aws_rds_cluster_instance.primary
]
}
resource "aws_rds_cluster_instance" "secondary" {
provider = aws.secondary
engine = "aurora-postgresql"
engine_version = "13.4"
identifier = "secondaryinstancetest"
cluster_identifier = "${aws_rds_cluster.secondary.cluster_identifier}"
instance_class = "db.r5.large"
db_subnet_group_name = "cse-cr"
}

Terraform tries to create the RDS cluster (aws_rds_cluster.secondary) before the DB subnet group was (aws_db_subnet_group.default) was provisioned. You have to have a dependency between them in order for Terraform to be able to detect the provisioning order.
To solve this, I recommend referencing the aws_db_subnet_group.default.name whenever it asks for the subnet group name. For example:
resource "aws_rds_cluster_instance" "primary" {
provider = aws.primary
engine = "aurora-postgresql"
engine_version = "13.4"
identifier = "primaryinstancetest"
cluster_identifier = aws_rds_cluster.primary.cluster_identifier
instance_class = "db.r5.large"
db_subnet_group_name = aws_db_subnet_group.default.name # there is no need to hardcode "cse-cr" here
}
Or another example in the secondary cluster:
resource "aws_rds_cluster_instance" "secondary" {
provider = aws.primary
engine = "aurora-postgresql"
engine_version = "13.4"
identifier = "secondaryinstancetest"
cluster_identifier = aws_rds_cluster.secondary.cluster_identifier
instance_class = "db.r5.large"
db_subnet_group_name = aws_rds_cluster.primary.cluster_identifier
}
Please note, I removed the secondary provider from the second cluster. You can not have a subnet (or a VPC) which spans to 2 different regions!

Related

Cannot provide RDS subnet through different terraform modules

I am unable to create an RDS due to failure in creating a subnet. I have different modules that I use to create an AWS infrastructure.
The main ones that i am having trouble with is RDS an VPC, where in the first one i create the database:
rds/main.tf
resource "aws_db_parameter_group" "education" {
name = "education"
family = "postgres14"
parameter {
name = "log_connections"
value = "1"
}
}
resource "aws_db_instance" "education" {
identifier = "education"
instance_class = "db.t3.micro"
allocated_storage = 5
engine = "postgres"
engine_version = "14.1"
username = "edu"
password = var.db_password
db_subnet_group_name = var.database_subnets
vpc_security_group_ids = var.rds_service_security_groups
parameter_group_name = aws_db_parameter_group.education.name
publicly_accessible = false
skip_final_snapshot = true
}
rds/variables.tf
variable "db_username" {
description = "RDS root username"
default = "someusername"
}
variable "db_password" {
description = "RDS root user password"
sensitive = true
}
variable "vpc_id" {
description = "VPC ID"
}
variable "rds_service_security_groups" {
description = "Comma separated list of security groups"
}
variable "database_subnets" {
description = "List of private subnets"
}
And the latter where i create the subnets and etc.
vpc/main.tf
resource "aws_subnet" "private" {
vpc_id = aws_vpc.main.id
cidr_block = element(var.private_subnets, count.index)
availability_zone = element(var.availability_zones, count.index)
count = length(var.private_subnets)
tags = {
Name = "${var.name}-private-subnet-${var.environment}-${format("%03d", count.index+1)}"
Environment = var.environment
}
}
resource "aws_subnet" "public" {
vpc_id = aws_vpc.main.id
cidr_block = element(var.public_subnets, count.index)
availability_zone = element(var.availability_zones, count.index)
count = length(var.public_subnets)
map_public_ip_on_launch = true
tags = {
Name = "${var.name}-public-subnet-${var.environment}-${format("%03d", count.index+1)}"
Environment = var.environment
}
}
resource "aws_subnet" "database" {
vpc_id = aws_vpc.main.id
cidr_block = element(var.database_subnets, count.index)
availability_zone = element(var.availability_zones, count.index)
count = length(var.database_subnets)
tags = {
Name = "Education"
Environment = var.environment
}
}
vpc/variables.tf
variable "name" {
description = "the name of the stack"
}
variable "environment" {
description = "the name of the environment "
}
variable "cidr" {
description = "The CIDR block for the VPC."
}
variable "public_subnets" {
description = "List of public subnets"
}
variable "private_subnets" {
description = "List of private subnets"
}
variable "database_subnets" {
description = "Database subnetes"
}
variable "availability_zones" {
description = "List of availability zones"
}
Then in the root directory i have a main.tf file where i create everything. In there i call the rds module
main.tf
module "rds" {
source = "./rds"
vpc_id = module.vpc.id
database_subnets = module.vpc.database_subnets
rds_service_security_groups = [module.security_groups.rds]
db_password = var.db_password
}
The error that i keep getting is this
Error: Incorrect attribute value type
│
│ on rds\\main.tf line 19, in resource "aws_db_instance" "education":
│ 19: db_subnet_group_name = var.database_subnets
│ ├────────────────
│ │ var.database_subnets is tuple with 2 elements
│
│ Inappropriate value for attribute "db_subnet_group_name": string required.
Any idea how i can fix it?
You are trying to pass a list of DB Subnets into a parameter that takes a DB Subnet Group name.
You need to modify your RDS module to create a DB Subnet Group with the given subnet IDs, and then pass that group name to the instance:
resource "aws_db_subnet_group" "education" {
name = "education"
subnet_ids = var.database_subnets
}
resource "aws_db_instance" "education" {
identifier = "education"
db_subnet_group_name = aws_db_subnet_group.education.name
...
}

Upgrade AWS RDS PostgreSQL from 10.18 to 13.4 using terraform

I am blocked while upgrading AWS RDS PostgreSQL from 10.18 to 13.4 using terraform. Below is the TF code used and errors. Pl suggest.
resource "aws_rds_cluster_instance" "aws_rds_cluster_instance" {
count = 3
identifier = "aws_rds_cluster_instance-${count.index}"
instance_class = "db.r4.large"
availability_zone = data.aws_availability_zones.available.names[count.index]
engine = "aurora-postgresql"
engine_version = aws_rds_cluster.aws_rds_cluster.engine_version
cluster_identifier = aws_rds_cluster.aws_rds_cluster.id
monitoring_interval = "60"
monitoring_role_arn = "arn:aws:iam::12345678910:role/role"
performance_insights_enabled = "true"
preferred_maintenance_window = "mon:12:20-mon:12:50"
promotion_tier = "1"
publicly_accessible = "false"
db_subnet_group_name = aws_db_subnet_group.db.name
tags = merge(map(
"system", "local.system_name",
"business_unit", "local.business_unit",
), local.base_tags)
lifecycle {
ignore_changes = [engine_version]
prevent_destroy = true
}
}
resource "aws_rds_cluster" "aws_rds_cluster" {
engine = "aurora-postgresql"
engine_version = "10.18"
cluster_identifier = "aws_rds_cluster_instance-cluster"
availability_zones = data.aws_availability_zones.available.names
database_name = "aws_rds_cluster_instance"
deletion_protection = true
master_username = var.db_username
master_password = var.db_password
backup_retention_period = 7
preferred_backup_window = "07:00-08:00"
preferred_maintenance_window = "mon:23:00-mon:23:30"
port = 5432
skip_final_snapshot = "true"
storage_encrypted = "true"
iam_database_authentication_enabled = "true"
kms_key_id = aws_kms_key.key.arn
vpc_security_group_ids = [aws_security_group.db.id]
db_subnet_group_name = aws_db_subnet_group.db.name
db_cluster_parameter_group_name = aws_rds_cluster_parameter_group.param_group.name
tags = merge(map(
"system", "local.system_name",
"business_unit", "local.business_unit",
), local.base_tags)
lifecycle {
ignore_changes = [engine_version]
prevent_destroy = true
}
}
I am getting below error after changing engine version and parameter group values.
Error: Failed to modify RDS Cluster (aws_rds_cluster_instance): InvalidParameterCombination: The AllowMajorVersionUpgrade flag must be present when upgrading to a new major version.
status code: 400, request id: 648f75a8-abcd-49be-z480-056e71e86e6c
on main.tf line 62, in resource "aws_rds_cluster" "aws_rds_cluster":
62: resource "aws_rds_cluster" "aws_rds_cluster" {
When we have added allow_major_version_upgrade = true under resource aws_rds_cluster_instance/aws_rds_cluster I have got below error
Error: Unsupported argument
on main.tf line 49, in resource "aws_rds_cluster_instance" "aws_rds_cluster_instance":
49: allow_major_version_upgrade = true
An argument named "allow_major_version_upgrade" is not expected here.
Error: Unsupported argument
on main.tf line 63, in resource "aws_rds_cluster" "aws_rds_cluster":
63: allow_major_version_upgrade = true
An argument named "allow_major_version_upgrade" is not expected here.
The allow_major_version_upgrade argument belongs to the schema in the aws_rds_cluster resource. Since the error thrown states it is unsupported, that means your version of the provider is too old to support that argument.
First upgrade the AWS provider to a minimum of 3.8.0, which is the first version supporting that argument.
Then, you can supply that argument and value to the resource as expected:
resource "aws_rds_cluster" "aws_rds_cluster" {
allow_major_version_upgrade = true
...
}

Terraform postgresql provider fails to create the role and database after the provision in aws

I'm trying to provision the postgres in the aws also create the database and roles sequentially using the terraform.
But getting the below exception and i could not able to create the role/db.
terraform {
required_providers {
# postgresql = {
# source = "cyrilgdn/postgresql"
# version = "1.15.0"
# }
postgresql = {
source = "terraform-providers/postgresql"
version = ">=1.7.2"
}
helm = {
source = "hashicorp/helm"
version = "2.4.1"
}
aws = {
source = "hashicorp/aws"
version = "4.0.0"
}
}
}
resource "aws_db_instance" "database" {
identifier = "dev-test"
allocated_storage = 100
storage_type = "gp2"
engine = "postgres"
engine_version = "13.4"
port = 5432
instance_class = "db.t3.micro"
username = "postgres"
performance_insights_enabled = true
password = "postgres$123"
db_subnet_group_name = "some_name"
vpc_security_group_ids = ["sg_name"]
parameter_group_name = "default.postgres13"
publicly_accessible = true
delete_automated_backups = false
storage_encrypted = true
tags = {
Name = "dev-test"
}
skip_final_snapshot = true
}
#To create the "raw" database
provider "postgresql" {
version = ">=1.4.0"
database = "raw"
host = aws_db_instance.database.address
port = aws_db_instance.database.port
username = aws_db_instance.database.username
password = aws_db_instance.database.password
sslmode = "require"
connect_timeout = 15
superuser = false
expected_version = aws_db_instance.database.engine_version
}
#creation of the role
resource "postgresql_role" "application_role" {
provider = postgresql
name = "test"
login = true
password = "test$123"
encrypted_password = true
create_database = false
depends_on = [aws_db_instance.database]
}
Error -
Error: dial tcp 18.221.183.66:5432: i/o timeout
│
│ with postgresql_role.application_role,
│ on main.tf line 79, in resource "postgresql_role" "application_role":
│ 79: resource "postgresql_role" "application_role" {
│
╵
I noticed few people are saying to include the expected_version attribute in the latest version should work.
Although including the expected version attribute still the issue persist.
I need to provision the postgres in the aws, create the db and roles.
What could be issue with my script ?
As per documentation [1], you are missing the scheme in the postgresql provider:
provider "postgresql" {
scheme = "awspostgres"
database = "raw"
host = aws_db_instance.database.address
port = aws_db_instance.database.port
username = aws_db_instance.database.username
password = aws_db_instance.database.password
sslmode = "require"
connect_timeout = 15
superuser = false
expected_version = aws_db_instance.database.engine_version
}
Additionally, I am not sure if you can use database = raw or it has to be database = "postgres", which is the default value so it does not have to be specified.
One other note: I do not think you need to specify the provider block in every resource. You just define it once in the required_providers block (like you did for aws provider) and then anything related to that provider will assume using the provider defined. In other words, you should remove the version = ">=1.4.0" from the provider "postgres" and provider = postgresql from the resource "postgresql_role" "application_role" and the code should still work.
[1] https://registry.terraform.io/providers/cyrilgdn/postgresql/latest/docs#aws

Helm - Kubernetes cluster unreachable: the server has asked for the client to provide credentials

I'm trying to deploy an EKS self managed with Terraform. While I can deploy the cluster with addons, vpc, subnet and all other resources, it always fails at helm:
Error: Kubernetes cluster unreachable: the server has asked for the client to provide credentials
with module.eks-ssp-kubernetes-addons.module.ingress_nginx[0].helm_release.nginx[0]
on .terraform/modules/eks-ssp-kubernetes-addons/modules/kubernetes-addons/ingress-nginx/main.tf line 19, in resource "helm_release" "nginx":
resource "helm_release" "nginx" {
This error repeats for metrics_server, lb_ingress, argocd, but cluster-autoscaler throws:
Warning: Helm release "cluster-autoscaler" was created but has a failed status.
with module.eks-ssp-kubernetes-addons.module.cluster_autoscaler[0].helm_release.cluster_autoscaler[0]
on .terraform/modules/eks-ssp-kubernetes-addons/modules/kubernetes-addons/cluster-autoscaler/main.tf line 1, in resource "helm_release" "cluster_autoscaler":
resource "helm_release" "cluster_autoscaler" {
My main.tf looks like this:
terraform {
backend "remote" {}
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 3.66.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = ">= 2.7.1"
}
helm = {
source = "hashicorp/helm"
version = ">= 2.4.1"
}
}
}
data "aws_eks_cluster" "cluster" {
name = module.eks-ssp.eks_cluster_id
}
data "aws_eks_cluster_auth" "cluster" {
name = module.eks-ssp.eks_cluster_id
}
provider "aws" {
access_key = "xxx"
secret_key = "xxx"
region = "xxx"
assume_role {
role_arn = "xxx"
}
}
provider "kubernetes" {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
token = data.aws_eks_cluster_auth.cluster.token
}
provider "helm" {
kubernetes {
host = data.aws_eks_cluster.cluster.endpoint
token = data.aws_eks_cluster_auth.cluster.token
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
}
}
My eks.tf looks like this:
module "eks-ssp" {
source = "github.com/aws-samples/aws-eks-accelerator-for-terraform"
# EKS CLUSTER
tenant = "DevOpsLabs2b"
environment = "dev-test"
zone = ""
terraform_version = "Terraform v1.1.4"
# EKS Cluster VPC and Subnet mandatory config
vpc_id = "xxx"
private_subnet_ids = ["xxx","xxx", "xxx", "xxx"]
# EKS CONTROL PLANE VARIABLES
create_eks = true
kubernetes_version = "1.19"
# EKS SELF MANAGED NODE GROUPS
self_managed_node_groups = {
self_mg = {
node_group_name = "DevOpsLabs2b"
subnet_ids = ["xxx","xxx", "xxx", "xxx"]
create_launch_template = true
launch_template_os = "bottlerocket" # amazonlinux2eks or bottlerocket or windows
custom_ami_id = "xxx"
public_ip = true # Enable only for public subnets
pre_userdata = <<-EOT
yum install -y amazon-ssm-agent \
systemctl enable amazon-ssm-agent && systemctl start amazon-ssm-agent \
EOT
disk_size = 10
instance_type = "t2.small"
desired_size = 2
max_size = 10
min_size = 0
capacity_type = "" # Optional Use this only for SPOT capacity as capacity_type = "spot"
k8s_labels = {
Environment = "dev-test"
Zone = ""
WorkerType = "SELF_MANAGED_ON_DEMAND"
}
additional_tags = {
ExtraTag = "t2x-on-demand"
Name = "t2x-on-demand"
subnet_type = "public"
}
create_worker_security_group = false # Creates a dedicated sec group for this Node Group
},
}
}
enable_amazon_eks_vpc_cni = true
amazon_eks_vpc_cni_config = {
addon_name = "vpc-cni"
addon_version = "v1.7.5-eksbuild.2"
service_account = "aws-node"
resolve_conflicts = "OVERWRITE"
namespace = "kube-system"
additional_iam_policies = []
service_account_role_arn = ""
tags = {}
}
enable_amazon_eks_kube_proxy = true
amazon_eks_kube_proxy_config = {
addon_name = "kube-proxy"
addon_version = "v1.19.8-eksbuild.1"
service_account = "kube-proxy"
resolve_conflicts = "OVERWRITE"
namespace = "kube-system"
additional_iam_policies = []
service_account_role_arn = ""
tags = {}
}
#K8s Add-ons
enable_aws_load_balancer_controller = true
enable_metrics_server = true
enable_cluster_autoscaler = true
enable_aws_for_fluentbit = true
enable_argocd = true
enable_ingress_nginx = true
depends_on = [module.eks-ssp.self_managed_node_groups]
}
OP has confirmed in the comment that the problem was resolved:
Of course. I think I found the issue. Doing "kubectl get svc" throws: "An error occurred (AccessDenied) when calling the AssumeRole operation: User: arn:aws:iam::xxx:user/terraform_deploy is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::xxx:user/terraform_deploy"
Solved it by using my actual role, that's crazy. No idea why it was calling itself.
For similar problem look also this issue.
I solved this error by adding dependencies in the helm installations.
The depends_on will wait for the step to successfully complete and then helm module runs.
module "nginx-ingress" {
depends_on = [module.eks, module.aws-load-balancer-controller]
source = "terraform-module/release/helm"
...}
module "aws-load-balancer-controller" {
depends_on = [module.eks]
source = "terraform-module/release/helm"
...}
module "helm_autoscaler" {
depends_on = [module.eks]
source = "terraform-module/release/helm"
...}

Is it possible to create a zone only node pool in a regional cluster in GKE?

I have a regional cluster for redundancy. In this cluster I want to create a node-pool in just 1 zone in this region. Is this configuration possible? reason I trying this is, I want to run service like RabbitMQ in just 1 zone to avoid split, and my application services running on all zones in the region for redundancy.
I am using terraform to create the cluster and node pools, below is my config for creating region cluster and zone node pool
resource "google_container_cluster" "regional_cluster" {
provider = google-beta
project = "my-project"
name = "my-cluster"
location = "us-central1"
node_locations = ["us-central1-a", "us-central1-b", "us-central1-c"]
master_auth {
username = ""
password = ""
client_certificate_config {
issue_client_certificate = false
}
}
}
resource "google_container_node_pool" "one_zone" {
project = google_container_cluster.regional_cluster.project
name = "zone-pool"
location = "us-central1-b"
cluster = google_container_cluster.regional_cluster.name
node_config {
machine_type = var.machine_type
image_type = var.image_type
disk_size_gb = 100
disk_type = "pd-standard"
}
}
This throws an error message
error creating NodePool: googleapi: Error 404: Not found: projects/my-project/zones/us-central1-b/clusters/my-cluster., notFound
Found out that location in google_container_node_pool should specify cluster master's region/zone. To actually specify the node-pool location node_locations should be used. Below is the config that worked
resource "google_container_cluster" "regional_cluster" {
provider = google-beta
project = "my-project"
name = "my-cluster"
location = "us-central1"
node_locations = ["us-central1-a", "us-central1-b", "us-central1-c"]
master_auth {
username = ""
password = ""
client_certificate_config {
issue_client_certificate = false
}
}
}
resource "google_container_node_pool" "one_zone" {
project = google_container_cluster.regional_cluster.project
name = "zone-pool"
location = google_container_cluster.regional_cluster.location
node_locations = ["us-central1-b"]
cluster = google_container_cluster.regional_cluster.name
node_config {
machine_type = var.machine_type
image_type = var.image_type
disk_size_gb = 100
disk_type = "pd-standard"
}
}