Restoring a AWS documentdb snapshot with terraform - mongodb

I am unsure how to restore an AWS documentdb cluster that is managed by terraform.
My terraform setup looks like this:
resource "aws_docdb_cluster" "this" {
cluster_identifier = var.env_name
engine = "docdb"
engine_version = "4.0.0"
master_username = "USERNAME"
master_password = random_password.this.result
db_cluster_parameter_group_name = aws_docdb_cluster_parameter_group.this.name
availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
db_subnet_group_name = aws_docdb_subnet_group.this.name
deletion_protection = true
backup_retention_period = 7
preferred_backup_window = "07:00-09:00"
skip_final_snapshot = false
# Added on 6.25.22 to rollback an incorrect application of the namespace
# migration, which occurred at 2AM EST on June 23.
snapshot_identifier = "...the arn for the snapshot..."
}
resource "aws_docdb_cluster_instance" "this_2a" {
count = 1
engine = "docdb"
availability_zone = "us-east-1a"
auto_minor_version_upgrade = true
cluster_identifier = aws_docdb_cluster.this.id
instance_class = "db.r5.large"
}
resource "aws_docdb_cluster_instance" "this_2b" {
count = 1
engine = "docdb"
availability_zone = "us-east-1b"
auto_minor_version_upgrade = true
cluster_identifier = aws_docdb_cluster.this.id
instance_class = "db.r5.large"
}
resource "aws_docdb_subnet_group" "this" {
name = var.env_name
subnet_ids = module.vpc.private_subnets
}
I added the snapshot_identifier parameter and applied it, expecting a rollback. However, this did not have the intended effect of restoring documentdb state to its settings on June 23rd. (As far as I can tell, nothing changed at all)
I wanted to avoid using the AWS console approach (described here) because that creates a new cluster which won't be tracked by terraform.
What is the proper way of accomplishing this rollback using terraform?

The snapshot_identifier parameter is only used when Terraform creates a new cluster. Setting it after the cluster has been created just tells Terraform "If you ever have to recreate this cluster, use this snapshot".
To actually get Terraform to recreate the cluster you would need to do something else to make Terraform think the cluster needs to be recreated. Possible options are:
Run terraform taint aws_docdb_cluster.this to signal to Terraform that the resource needs to be recreated. It will then recreate it the next time you run terraform apply.
Delete the cluster through some other means, like the AWS console, and then run terraform apply.

The general approach is this, but i have no experience with documentdb. Hope this helps.
0. Take a backup of your terrafrom state file terraform state pull > backup_state_file_timestamp.json
Restore through the console to the point in time you want.
Remove the old instances and cluster from your terraform state file
terraform state rm aws_docdb_cluster_instance.this_2a
terraform state rm aws_docdb_cluster_instance.this_2b
terraform state rm aws_docdb_cluster.this
Import the manually restored cluster and instance into terraform
terraform import aws_docdb_cluster.this cluster_identifier
terraform import rm aws_docdb_cluster_instance.this_2a identifier
terraform import rm aws_docdb_cluster_instance.this_2b identifier
(see import at the bottom https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/docdb_cluster_instance and https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/docdb_cluster)

Related

To enable preview feature of azure resource provider

I would like to enable an azure preview feature via terraform. I have configured skip provider registration but when I tried to apply still get provider already exists error. I have to import manually as a workaround.
QUERY?:
do we must import manually to avoid provider exist error when register preview feature?
as I already define skip registration but seems it didn’t work.
Thanks!
======== configuration ========
Configure the Azure provider
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm”
version = “~> 2.99"
}
}
required_version = “>= 1.1.0”
}
provider “azurerm” {
features {}
skip_provider_registration = true
}
resource “azurerm_resource_provider_registration” “example” {
name = “Microsoft.Network”
feature {
name = “AFWEnableNetworkRuleNameLogging”
registered = true
}
}
have configured to skip provider registration but when I tried to apply still get provider already exists.
======== error log
terraform apply main.tf plan
azurerm_resource_provider_registration.example: Creating…
╷
│ Error: A resource with the ID. “/subscriptions/xxxx-xxxx/providers/Microsoft.Network” already exists - to be managed via Terraform this resource needs to be imported into the State. Please see the resource documentation for “azurerm_resource_provider_registration” for more information.
Any solution on the above requirement to enable the preview feature of the corresponding namespace resource provider.
If the Terraform statefile already contains the relevant providers, we should import it first before making any changes. Then only Terraform will read the respective changes from statefile.
Step1:
Add below code in provider tf and main tf as below
provider tf file
terraform {
required_version = ">= 1.1.0"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 2.99"
}
}
}
provider "azurerm" {
features {}
skip_provider_registration = true
}
main tf file as follows
resource "azurerm_resource_provider_registration" "example" {
name = "Microsoft.Network"
feature {
name = "AFWEnableNetworkRuleNameLogging"
registered = true
}
}
Step2:
Run bellow commands
terraform plan
Run below command
terraform apply -auto-approve
NOTE:
if error saying its "already exists - to be managed via Terraform this resource needs to be imported into the State." then please run below command to import the respective service via terraform
terraform import azurerm_resource_provider_registration.example /subscriptions/************************/providers/Microsoft.Network
Output as follows:
Step3:
run below commands
terraform plan
terraform apply -auto-approve

What does this error mean when trying to use an AppRole from Vault on an Ingress deployment?

Context
We were trying to fix an inconsistency between Terraform and our cloud provider because a database was deleted through the cloud's UI console and the changes were not properly imported into Terraform.
For reasons we preferred to not do terraform import and proceeded to change the state file to remove all references to that database hoping that would allow us to run things like plan, and it did work, but we came across other issues...
Oh, I should add that we run things like Helm through Terraform to set up our Kubernetes infra as well.
The problem
Now Terraform makes a plan to remove a Google Container Node Pool (desired outcome) and to update a Kubernetes resource of kind Ingress. The latter change is not really intended, although it could be because there's a Terraform module dependency between the module that sets up all the cluster (including node pools) and the module that sets up Ingress.
Now the issue comes from updating that Ingress. Here's the plan:
# Terraform will read AppRole from Vault
data "vault_approle_auth_backend_role_id" "role" {
- backend = "approle" -> null
~ id = "auth/approle/role/nginx-ingress/role-id" -> (known after apply)
~ role_id = "<some UUID>" -> (known after apply)
role_name = "nginx-ingress"
}
# Now this is the resource that makes everything blow up
resource "helm_release" "nginx-ingress" {
atomic = false
chart = ".terraform/modules/nginx-ingress/terraform/../helm"
...
...
- set_sensitive {
- name = "appRole.roleId" -> null
- value = (sensitive value)
}
+ set_sensitive {
+ name = "appRole.roleId"
+ value = (sensitive value)
}
- set_sensitive {
- name = "appRole.secretId" -> null
- value = (sensitive value)
}
+ set_sensitive {
+ name = "appRole.secretId"
+ value = (sensitive value)
}
}
And here's the error message we get:
When expanding the plan for module.nginx-ingress.helm_release.nginx-ingress to
include new values learned so far during apply, provider
"registry.terraform.io/hashicorp/helm" produced an invalid new value for
.set_sensitive: planned set element
cty.ObjectVal(map[string]cty.Value{"name":cty.StringVal("appRole.secretId"),
"type":cty.NullVal(cty.String),
"value":cty.StringVal("<some other UUID>")}) does not
correlate with any element in actual.
This is a bug in the provider, which should be reported in the provider's own
issue tracker.
What we tried
We thought that maybe the AppRole's secretId had rotated or changed, so we took the secretId from the State of another environment that uses the same AppRole from the same Vault and set it in our modified state file. That didn't work.

Terraform Unable to find Helm Release charts

I'm running Kubernetes on GCP and doing changes via Terraform v0.11.14
When running terraform plan I'm getting the error messages here
Error: Error refreshing state: 2 errors occurred:
* module.cls-xxx-us-central1-a-dev.helm_release.cert-manager: 1 error occurred:
* module.cls-xxx-us-central1-a-dev.helm_release.cert-manager: helm_release.cert-manager: error installing: the server could not find the requested resource
* module.cls-xxx-us-central1-a-dev.helm_release.nginx: 1 error occurred:
* module.cls-xxx-us-central1-a-dev.helm_release.nginx: helm_release.nginx: error installing: the server could not find the requested resource
Here's a copy of my helm.tf
resource "helm_release" "nginx" {
depends_on = ["google_container_node_pool.tally-np"]
name = "ingress-nginx"
chart = "ingress-nginx/ingress-nginx"
namespace = "kube-system"
}
resource "helm_release" "cert-manager" {
depends_on = ["google_container_node_pool.tally-np"]
name = "cert-manager"
chart = "stable/cert-manager"
namespace = "kube-system"
set {
name = "ingressShim.defaultIssuerName"
value = "letsencrypt-production"
}
set {
name = "ingressShim.defaultIssuerKind"
value = "ClusterIssuer"
}
provisioner "local-exec" {
command = "gcloud container clusters get-credentials ${var.cluster_name} --zone ${google_container_cluster.cluster.zone} && kubectl create -f ${path.module}/letsencrypt-prod.yaml"
}
}
I've read that Helm deprecated most of the old chart repos so I tried adding the repositories and installing the charts locally under the namespace kube-system but so far the issue is still persisting.
Here's the list of versions for Terraform and it's providers
Terraform v0.11.14
provider.google v2.17.0
provider.helm v0.10.2
provider.kubernetes v1.9.0
provider.random v2.2.1
As the community is moving towards Helm v3, the maintainers have depreciated the old helm model where we had a single mono repo called stable. The new model is like each product having its own repo. On November 13, 2020 the stable and incubator charts repository reached the end of development and became archives.
The archived charts are now hosted at a new URL. To continue using the archived charts, you will have to make some tweaks in your helm workflow.
Sample workaround:
helm repo add new-stable https://charts.helm.sh/stable
helm fetch new-stable/prometheus-operator

Create GKE cluster and namespace with Terraform

I need to create GKE cluster and then create namespace and install db through helm to that namespace. Now I have gke-cluster.tf that creates cluster with node pool and helm.tf, that has kubernetes provider and helm_release resource. It first creates cluster, but then tries to install db but namespace doesn't exist yet, so I have to run terraform apply again and it works. I want to avoid scenario with multiple folder and run terraform apply only once. What's the good practice for situaction like this? Thanks for the answers.
The create_namespace argument of helm_release resource can help you.
create_namespace - (Optional) Create the namespace if it does not yet exist. Defaults to false.
https://registry.terraform.io/providers/hashicorp/helm/latest/docs/resources/release#create_namespace
Alternatively, you can define a dependency between the namespace resource and helm_release like below:
resource "kubernetes_namespace" "prod" {
metadata {
annotations = {
name = "prod-namespace"
}
labels = {
namespace = "prod"
}
name = "prod"
}
}
resource "helm_release" "arango-crd" {
name = "arango-crd"
chart = "./kube-arangodb-crd"
namespace = "prod"
depends_on = [ kubernetes_namespace.prod ]
}
The solution posted by user adp is correct but I wanted to give more insight on using Terraform for this particular example in regards of running single commmand:
$ terraform apply --auto-approve.
Basing on following comments:
Can you tell how are you creating your namespace? Is it with kubernetes provider? - Dawid Kruk
resource "kubernetes_namespace" - Jozef Vrana
This setup needs specific order of execution. First the cluster, then the resources. By default Terraform will try to create all of the resources at the same time. It is crucial to use a parameter depends_on = [VALUE].
The next issue is that the kubernetes provider will try to fetch the credentials at the start of the process from ~/.kube/config. It will not wait for the cluster provisioning to get the actual credentials. It could:
fail when there is no .kube/config
fetch credentials for the wrong cluster.
There is ongoing feature request to resolve this kind of use case (also there are some workarounds):
Github.com: Hashicorp: Terraform: Issue: depends_on for providers
As an example:
# Create cluster
resource "google_container_cluster" "gke-terraform" {
project = "PROJECT_ID"
name = "gke-terraform"
location = var.zone
initial_node_count = 1
}
# Get the credentials
resource "null_resource" "get-credentials" {
depends_on = [google_container_cluster.gke-terraform]
provisioner "local-exec" {
command = "gcloud container clusters get-credentials ${google_container_cluster.gke-terraform.name} --zone=europe-west3-c"
}
}
# Create a namespace
resource "kubernetes_namespace" "awesome-namespace" {
depends_on = [null_resource.get-credentials]
metadata {
name = "awesome-namespace"
}
}
Assuming that you had earlier configured cluster to work on and you didn't delete it:
Credentials for Kubernetes cluster are fetched.
Terraform will create a cluster named gke-terraform
Terraform will run a local command to get the credentials for gke-terraform cluster
Terraform will create a namespace (using old information):
if you had another cluster in .kube/config configured, it will create a namespace in that cluster (previous one)
if you deleted your previous cluster, it will try to create a namespace in that cluster and fail (previous one)
if you had no .kube/config it will fail on the start
Important!
Using "helm_release" resource seems to get the credentials when provisioning the resources, not at the start!
As said you can use helm provider to provision the resources on your cluster to avoid the issues I described above.
Example on running a single command for creating a cluster and provisioning resources on it:
variable zone {
type = string
default = "europe-west3-c"
}
resource "google_container_cluster" "gke-terraform" {
project = "PROJECT_ID"
name = "gke-terraform"
location = var.zone
initial_node_count = 1
}
data "google_container_cluster" "gke-terraform" {
project = "PROJECT_ID"
name = "gke-terraform"
location = var.zone
}
resource "null_resource" "get-credentials" {
# do not start before resource gke-terraform is provisioned
depends_on = [google_container_cluster.gke-terraform]
provisioner "local-exec" {
command = "gcloud container clusters get-credentials ${google_container_cluster.gke-terraform.name} --zone=${var.zone}"
}
}
resource "helm_release" "mydatabase" {
name = "mydatabase"
chart = "stable/mariadb"
# do not start before the get-credentials resource is run
depends_on = [null_resource.get-credentials]
set {
name = "mariadbUser"
value = "foo"
}
set {
name = "mariadbPassword"
value = "qux"
}
}
Using above configuration will yield:
data.google_container_cluster.gke-terraform: Refreshing state...
google_container_cluster.gke-terraform: Creating...
google_container_cluster.gke-terraform: Still creating... [10s elapsed]
<--OMITTED-->
google_container_cluster.gke-terraform: Still creating... [2m30s elapsed]
google_container_cluster.gke-terraform: Creation complete after 2m38s [id=projects/PROJECT_ID/locations/europe-west3-c/clusters/gke-terraform]
null_resource.get-credentials: Creating...
null_resource.get-credentials: Provisioning with 'local-exec'...
null_resource.get-credentials (local-exec): Executing: ["/bin/sh" "-c" "gcloud container clusters get-credentials gke-terraform --zone=europe-west3-c"]
null_resource.get-credentials (local-exec): Fetching cluster endpoint and auth data.
null_resource.get-credentials (local-exec): kubeconfig entry generated for gke-terraform.
null_resource.get-credentials: Creation complete after 1s [id=4191245626158601026]
helm_release.mydatabase: Creating...
helm_release.mydatabase: Still creating... [10s elapsed]
<--OMITTED-->
helm_release.mydatabase: Still creating... [1m40s elapsed]
helm_release.mydatabase: Creation complete after 1m44s [id=mydatabase]

Unable to create windows nodepool on GKE cluster with google terraform GKE module

I am trying to provision a GKE cluster with windows node_pool using google modules, I am calling module
source = "terraform-google-modules/kubernetes-engine/google//modules/beta-private-cluster-update-variant"
version = "9.2.0"
I had to define two pools one for linux pool required by GKE and the windows one we require, terraform always succeeds in provisioning the linux node_pool but fails to provision the windows one and the error message
module.gke.google_container_cluster.primary: Still modifying... [id=projects/uk-xxx-xx-xxx-b821/locations/europe-west2/clusters/gke-nonpci-dev, 24m31s elapsed]
module.gke.google_container_cluster.primary: Still modifying... [id=projects/uk-xxx-xx-xxx-b821/locations/europe-west2/clusters/gke-nonpci-dev, 24m41s elapsed]
module.gke.google_container_cluster.primary: Still modifying... [id=projects/uk-xxx-xx-xxx-b821/locations/europe-west2/clusters/gke-nonpci-dev, 24m51s elapsed]
module.gke.google_container_cluster.primary: Modifications complete after 24m58s [id=projects/xx-xxx-xx-xxx-b821/locations/europe-west2/clusters/gke-nonpci-dev]
module.gke.google_container_node_pool.pools["windows-node-pool"]: Creating...
Error: error creating NodePool: googleapi: Error 400: Workload Identity is not supported on Windows nodes. Create the nodepool without workload identity by specifying --workload-metadata=GCE_METADATA., badRequest
on .terraform\modules\gke\terraform-google-kubernetes-engine-9.2.0\modules\beta-private-cluster-update-variant\cluster.tf line 341, in resource "google_container_node_pool" "pools":
341: resource "google_container_node_pool" "pools" {
I tried many places to set this metadata values but I coldn't get it right:
From terraform side :
I tried many places to add this metadata inside the node_config scope in the module itself or in my main.tf file where I call the module I tried to add it to the windows node_pool scope of the node_pools list but it didn't accept it with a message that setting WORKLOAD IDENTITY isn't expected here
I tried also setting enable_shielded_nodes = false but this didn't really help much.
I tried to test this if it is doable even through the command line this was my command line
C:\>gcloud container node-pools --region europe-west2 list
NAME MACHINE_TYPE DISK_SIZE_GB NODE_VERSION
default-node-pool-d916 n1-standard-2 100 1.17.9-gke.600
C:\>gcloud container node-pools --region europe-west2 create window-node-pool --cluster=gke-nonpci-dev --image-type=WINDOWS_SAC --no-enable-autoupgrade --machine-type=n1-standard-2
WARNING: Starting in 1.12, new node pools will be created with their legacy Compute Engine instance metadata APIs disabled by default. To create a node pool with legacy instance metadata endpoints disabled, run `node-pools create` with the flag `--metadata disable-legacy-endpoints=true`.
This will disable the autorepair feature for nodes. Please see https://cloud.google.com/kubernetes-engine/docs/node-auto-repair for more information on node autorepairs.
ERROR: (gcloud.container.node-pools.create) ResponseError: code=400, message=Workload Identity is not supported on Windows nodes. Create the nodepool without workload identity by specifying --workload-metadata=GCE_METADATA.
C:\>gcloud container node-pools --region europe-west2 create window-node-pool --cluster=gke-nonpci-dev --image-type=WINDOWS_SAC --no-enable-autoupgrade --machine-type=n1-standard-2 --workload-metadata=GCE_METADATA --metadata disable-legacy-endpoints=true
This will disable the autorepair feature for nodes. Please see https://cloud.google.com/kubernetes-engine/docs/node-auto-repair for more information on node autorepairs.
ERROR: (gcloud.container.node-pools.create) ResponseError: code=400, message=Service account "874988475980-compute#developer.gserviceaccount.com" does not exist.
C:\>gcloud auth list
Credentialed Accounts
ACTIVE ACCOUNT
* tf-xxx-xxx-xx-xxx#xx-xxx-xx-xxx-xxxx.iam.gserviceaccount.com
This service account from running gcloud auth list is the one I am running terraform with but I don't know where is this one in the error message coming from, even though trying to create the windows nodepool through command line as shown above also didn't work I am a bit stuck and I don't know what to do.
As module 9.2.0 is a stable module for us through all our linux based clusters we setup before, hence I thought this may be an old version for a windows node_pool I used the 11.0.0 instead to see if this would make it any different but ended up with a different error
module.gke.google_container_node_pool.pools["default-node-pool"]: Refreshing state... [id=projects/uk-tix-p1-npe-b821/locations/europe-west2/clusters/gke-nonpci-dev/nodePools/default-node-pool-d916]
Error: failed to execute ".terraform/modules/gke.gcloud_delete_default_kube_dns_configmap/terraform-google-gcloud-1.4.1/scripts/check_env.sh": fork/exec .terraform/modules/gke.gcloud_delete_default_kube_dns_configmap/terraform-google-gcloud-1.4.1/scripts/check_env.sh: %1 is not a valid Win32 application.
on .terraform\modules\gke.gcloud_delete_default_kube_dns_configmap\terraform-google-gcloud-1.4.1\main.tf line 70, in data "external" "env_override":
70: data "external" "env_override" {
Error: failed to execute ".terraform/modules/gke.gcloud_wait_for_cluster/terraform-google-gcloud-1.3.0/scripts/check_env.sh": fork/exec .terraform/modules/gke.gcloud_wait_for_cluster/terraform-google-gcloud-1.3.0/scripts/check_env.sh: %1 is not a valid Win32 application.
on .terraform\modules\gke.gcloud_wait_for_cluster\terraform-google-gcloud-1.3.0\main.tf line 70, in data "external" "env_override":
70: data "external" "env_override" {
This how I set node_pools parameters
node_pools = [
{
name = "linux-node-pool"
machine_type = var.nodepool_instance_type
min_count = 1
max_count = 10
disk_size_gb = 100
disk_type = "pd-standard"
image_type = "COS"
auto_repair = true
auto_upgrade = true
service_account = google_service_account.gke_cluster_sa.email
preemptible = var.preemptible
initial_node_count = 1
},
{
name = "windows-node-pool"
machine_type = var.nodepool_instance_type
min_count = 1
max_count = 10
disk_size_gb = 100
disk_type = "pd-standard"
image_type = var.nodepool_image_type
auto_repair = true
auto_upgrade = true
service_account = google_service_account.gke_cluster_sa.email
preemptible = var.preemptible
initial_node_count = 1
}
]
cluster_resource_labels = var.cluster_resource_labels
# health check and webhook firewall rules
node_pools_tags = {
all = [
"xx-xxx-xxx-local-xxx",
]
}
node_pools_metadata = {
all = {
// workload-metadata = "GCE_METADATA"
}
linux-node-pool = {
ssh-keys = join("\n", [for user, key in var.node_ssh_keys : "${user}:${key}"])
block-project-ssh-keys = true
}
windows-node-pool = {
workload-metadata = "GCE_METADATA"
}
}
this is a shared VPC where I provision my cluster with cluster version: 1.17.9-gke.600
Checkout https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/issues/632 for the solution.
Error message is ambiguous and GKE has an internal bug to track this issue. We will improve the error message soon.