Terraform - AWS - API Gateway dependency conundrum - aws-api-gateway

I am trying to provision some AWS resources, specifically an API Gateway which is connected to a Lambda. I am using Terraform v0.8.8.
I have a module which provisions the Lambda and returns the lambda function ARN as an output, which I then provide as a parameter to the following API Gateway provisioning code (which is based on the example in the TF docs):
provider "aws" {
access_key = "${var.access_key}"
secret_key = "${var.secret_key}"
region = "${var.region}"
}
# Variables
variable "myregion" { default = "eu-west-2" }
variable "accountId" { default = "" }
variable "lambdaArn" { default = "" }
variable "stageName" { default = "lab" }
# API Gateway
resource "aws_api_gateway_rest_api" "api" {
name = "myapi"
}
resource "aws_api_gateway_method" "method" {
rest_api_id = "${aws_api_gateway_rest_api.api.id}"
resource_id = "${aws_api_gateway_rest_api.api.root_resource_id}"
http_method = "GET"
authorization = "NONE"
}
resource "aws_api_gateway_integration" "integration" {
rest_api_id = "${aws_api_gateway_rest_api.api.id}"
resource_id = "${aws_api_gateway_rest_api.api.root_resource_id}"
http_method = "${aws_api_gateway_method.method.http_method}"
integration_http_method = "POST"
type = "AWS"
uri = "arn:aws:apigateway:${var.myregion}:lambda:path/2015-03-31/functions/${var.lambdaArn}/invocations"
}
# Lambda
resource "aws_lambda_permission" "apigw_lambda" {
statement_id = "AllowExecutionFromAPIGateway"
action = "lambda:InvokeFunction"
function_name = "${var.lambdaArn}"
principal = "apigateway.amazonaws.com"
source_arn = "arn:aws:execute-api:${var.myregion}:${var.accountId}:${aws_api_gateway_rest_api.api.id}/*/${aws_api_gateway_method.method.http_method}/resourcepath/subresourcepath"
}
resource "aws_api_gateway_deployment" "deployment" {
rest_api_id = "${aws_api_gateway_rest_api.api.id}"
stage_name = "${var.stageName}"
}
When I run the above from scratch (i.e. when none of the resources exist) I get the following error:
Error applying plan:
1 error(s) occurred:
* aws_api_gateway_deployment.deployment: Error creating API Gateway Deployment: BadRequestException: No integration defined for method
status code: 400, request id: 15604135-03f5-11e7-8321-f5a75dc2b0a3
Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.
If I perform a 2nd TF application it consistently applies successfully, but every time I destroy I then receive the above error upon the first application.
This caused me to wonder if there's a dependency that I need to explicitly declare somewhere, I discovered #7486, which describes a similar pattern (although relating to an aws_api_gateway_integration_response rather than an aws_api_gateway_deployment). I tried manually adding an explicit dependency from the aws_api_gateway_deployment to the aws_api_gateway_integration but this had no effect.
Grateful for any thoughts, including whether this may indeed be a TF bug in which case I will raise it in the issue tracker. I thought I'd check with the community before doing so in case I'm missing something obvious.
Many thanks,
Edd
P.S. I've asked this question on the Terraform user group but this seems to get very little in the way of responses, I'm yet to figure out the cause of the issue hence now asking here.

You are right about the explicit dependency declaration.
Normally Terraform would be able to figure out the relationships and schedule create/update/delete operations accordingly to that - this is mostly possible because of the interpolation mechanisms under the hood (${resource_type.ref_name.attribute}). You can display the relationships affecting this in a graph via terraform graph.
Unfortunately in this specific case there's no direct relationship between API Gateway Deployments and Integrations - meaning the API interface for managing API Gateway resources doesn't require you to reference integration ID or anything like that to create deployment and the api_gateway_deployment resource in turn doesn't require that either.
The documentation for aws_api_gateway_deployment does mention this caveat at the top of the page. Admittedly the Deployment not only requires the method to exist, but integration too.
Here's how you can modify your code to get around it:
resource "aws_api_gateway_deployment" "deployment" {
rest_api_id = "${aws_api_gateway_rest_api.api.id}"
stage_name = "${var.stageName}"
depends_on = ["aws_api_gateway_method.method", "aws_api_gateway_integration.integration"]
}
Theoretically the "aws_api_gateway_method.method" is redundant since the integration already references the method in the config:
http_method = "${aws_api_gateway_method.method.http_method}"
so it will be scheduled for creation/update prior to the integration either way, but if you were to change that to something like
http_method = "GET"
then it would become necessary.
I have submitted PR to update the docs accordingly.

Related

What does this error mean when trying to use an AppRole from Vault on an Ingress deployment?

Context
We were trying to fix an inconsistency between Terraform and our cloud provider because a database was deleted through the cloud's UI console and the changes were not properly imported into Terraform.
For reasons we preferred to not do terraform import and proceeded to change the state file to remove all references to that database hoping that would allow us to run things like plan, and it did work, but we came across other issues...
Oh, I should add that we run things like Helm through Terraform to set up our Kubernetes infra as well.
The problem
Now Terraform makes a plan to remove a Google Container Node Pool (desired outcome) and to update a Kubernetes resource of kind Ingress. The latter change is not really intended, although it could be because there's a Terraform module dependency between the module that sets up all the cluster (including node pools) and the module that sets up Ingress.
Now the issue comes from updating that Ingress. Here's the plan:
# Terraform will read AppRole from Vault
data "vault_approle_auth_backend_role_id" "role" {
- backend = "approle" -> null
~ id = "auth/approle/role/nginx-ingress/role-id" -> (known after apply)
~ role_id = "<some UUID>" -> (known after apply)
role_name = "nginx-ingress"
}
# Now this is the resource that makes everything blow up
resource "helm_release" "nginx-ingress" {
atomic = false
chart = ".terraform/modules/nginx-ingress/terraform/../helm"
...
...
- set_sensitive {
- name = "appRole.roleId" -> null
- value = (sensitive value)
}
+ set_sensitive {
+ name = "appRole.roleId"
+ value = (sensitive value)
}
- set_sensitive {
- name = "appRole.secretId" -> null
- value = (sensitive value)
}
+ set_sensitive {
+ name = "appRole.secretId"
+ value = (sensitive value)
}
}
And here's the error message we get:
When expanding the plan for module.nginx-ingress.helm_release.nginx-ingress to
include new values learned so far during apply, provider
"registry.terraform.io/hashicorp/helm" produced an invalid new value for
.set_sensitive: planned set element
cty.ObjectVal(map[string]cty.Value{"name":cty.StringVal("appRole.secretId"),
"type":cty.NullVal(cty.String),
"value":cty.StringVal("<some other UUID>")}) does not
correlate with any element in actual.
This is a bug in the provider, which should be reported in the provider's own
issue tracker.
What we tried
We thought that maybe the AppRole's secretId had rotated or changed, so we took the secretId from the State of another environment that uses the same AppRole from the same Vault and set it in our modified state file. That didn't work.

fetch and update particular field using terraform

i have a scenario,
How to fetch particular field value and also update particular field value?
For example :
Im deploying an application using terraform "kubernetes_deployment" resource configured with environment variables(endpoint=abc) and replicas=2.
resource "kubernetes_deployment" “app” {
…..….
spec {
replicas = 2
template {
spec {
….
env {
name = “ENDPOINT”
value = “abc”
}
}
Once i deployed using terraform script, another script might change configurations replicas=5 and environment values(endpoint=xyz)
Now i need to update only replicas to 20(if replicas < 20) through terraform script without changing the environment values(endpoint=abc)?
resource "kubernetes_deployment" “app” {
…..….
spec {
replicas = 20 -> only this has to reflect in apply
template {
spec {
….
env {
name = “ENDPOINT”
value = “abc”
}
}
How can i fetch particular field(replicas) to compare if replicas count > 20 and update only replicas count?
Can someone with more Terraform experience help me on this?
Inside the "kubernetes_deployment" resource block, consider adding a lifecycle block. Use it to ignore changes to resource attributes that can be made outside of Terraform's knowledge.
Provide a list of resource attributes to "ignore_changes", which Terrform would ignore in subsequent runs. The arguments are the relative address of the attributes in the resource. Map and list elements can be referenced using index notation.
lifecycle {
ignore_changes = [spec["env"]]
}
Reference: https://www.terraform.io/docs/language/meta-arguments/lifecycle.html#ignore_changes

409 (request "Conflict") when creating second Endpoint connection in MongoDB Atlas using Terraform

I need to create many MongoDB Atlas endpoint connections using terraform.
I successfully create first, using this code:
#Private endpoint connection
resource "mongodbatlas_private_endpoint" "dbpe" {
project_id = var.prj_id
provider_name = "AWS"
region = var.aws_region
}
#AWS endpoint for secure connect to mongo db
resource "aws_vpc_endpoint" "ec2" {
vpc_id = var.sh_vpc
#service_name = "com.amazonaws.${var.aws_region}.ec2"
service_name = mongodbatlas_private_endpoint.dbpe.endpoint_service_name
vpc_endpoint_type = "Interface"
security_group_ids = [
aws_security_group.lb_sg.id,
]
subnet_ids = [
aws_subnet.subnet1.id,
var.sh_subnet
]
tags = {
"Name" = local.tname
}
#private_dns_enabled = true
}
But when I try to use this code second time in another folder (another tfstate) it failed cause error:
Error: error creating MongoDB Private Endpoints Connection: POST https://cloud.mongodb.com/api/atlas/v1.0/groups/***/privateEndpoint: 409 (request "Conflict") A PrivateLink Endpoint Service already exists for AWS region US_EAST_2.
As I understand, a second "mongodbatlas_private_endpoint" "dbpe" trying to create another one Endpoint service. But, when I creating second Endpoint manually through WebUI, it using the same service like first Endpoint.
How I can tell to second Endpoint to use the existing service?
Or maybe it all wrong?
Please, help!
Thank you!
I found the solution.
Creating the "Endpoint Connection" really creates Endpoint only when you do it at first time. All of next times is creating an only association between Atlas endpoint and new AWS Endpoint.
In terraform I tried to create an Atlas endpoint second time and catch an error (because of limit - 1 endpoint per region). All I need to do - is create "Basic Endpoint" one time (by separate folder with own tfstate) and don't delete it. And for each new AWS endpoint need to create a new link from AWS Endpoint to "Basic". I do it by a terraform resource:
mongodbatlas_private_endpoint_interface_link
Resource "mongodbatlas_private_endpoint" is not need now. A "service_name" parameter in "aws_vpc_endpoint" you can hardcoded from "Basic" Endpoint. Use "output" to see mongodbatlas_private_endpoint.test.private_link_id - this is what you need.

Nextflow doesn't use the right service account to deploy workflows to kubernetes

We're trying to use nextflow on a k8s namespace other than our default, the namespace we're using is nextflownamespace. We've created our PVC and ensured the default service account has an admin rolebinding. We're getting an error that nextflow can't access the PVC:
"message": "persistentvolumeclaims \"my-nextflow-pvc\" is forbidden:
User \"system:serviceaccount:mynamespace:default\" cannot get resource
\"persistentvolumeclaims\" in API group \"\" in the namespace \"nextflownamespace\"",
In that error we see that system:serviceaccount:mynamespace:default is incorrectly pointing to our default namespace, mynamespace, not nextflownamespace which we created for nextflow use.
We tried adding debug.yaml = true to our nextflow.config but couldn't find the YAML it submits to k8s to validate the error. Our config file looks like this:
profiles {
standard {
k8s {
executor = "k8s"
namespace = "nextflownamespace"
cpus = 1
memory = 1.GB
debug.yaml = true
}
aws{
endpoint = "https://s3.nautilus.optiputer.net"
}
}
We did verify that when we change the namespace to another arbitrary value the error message used the new arbitrary namespace, but the service account name continued to point to the users default namespace erroneously.
We've tried every variant of profiles.standard.k8s.serviceAccount = "system:serviceaccount:nextflownamespace:default" that we could think of but didn't get any change with those attempts.
I think it's best to avoid using nested config profiles with Nextflow. I would either remove the 'standard' layer from your profile or just make 'standard' a separate profile:
profiles {
standard {
process.executor = 'local'
}
k8s {
executor = "k8s"
namespace = "nextflownamespace"
cpus = 1
memory = 1.GB
debug.yaml = true
}
aws{
endpoint = "https://s3.nautilus.optiputer.net"
}
}

How to issue letsencrypt certificate for k8s (AKS) using terraform resources?

Summary
I am unable to issue a valid certificate for my terraform kubernetes cluster on azure aks. The domain and certificate is successfully created (cert is created according to crt.sh), however the certificate is not applied to my domain and my browser reports "Kubernetes Ingress Controller Fake Certificate" as the applied certificate.
The terraform files are converted to the best of my abilities from a working set of yaml files (that issues certificates just fine). See my terraform code here.
UPDATE! In the original question I was also unable to create certificates. This was fixed by using the "tls_cert_request" resource from here. The change is included in my updated code below.
Here a some things I have checked out and found NOT to be the issue
The number of issued certificates from acme letsencrypt is not above rate-limits for either staging or prod.
I get the same "Fake certificate" error using both staging or prod certificate server.
Here are some areas that I am currently investigating as potential sources for the error.
I do not see a terraform-equivalent of the letsencrypt yaml input "privateKeySecretRef" and consequently what the value of my deployment ingress "certmanager.k8s.io/cluster-issuer" should be.
If anyone have any other suggestions, I would really appreciate to hear them (as this has been bugging me for quite some time now)!
Certificate Resources
provider "acme" {
server_url = var.context.cert_server
}
resource "tls_private_key" "reg_private_key" {
algorithm = "RSA"
}
resource "acme_registration" "reg" {
account_key_pem = tls_private_key.reg_private_key.private_key_pem
email_address = var.context.email
}
resource "tls_private_key" "cert_private_key" {
algorithm = "RSA"
}
resource "tls_cert_request" "req" {
key_algorithm = "RSA"
private_key_pem = tls_private_key.cert_private_key.private_key_pem
dns_names = [var.context.domain_address]
subject {
common_name = var.context.domain_address
}
}
resource "acme_certificate" "certificate" {
account_key_pem = acme_registration.reg.account_key_pem
certificate_request_pem = tls_cert_request.req.cert_request_pem
dns_challenge {
provider = "azure"
config = {
AZURE_CLIENT_ID = var.context.client_id
AZURE_CLIENT_SECRET = var.context.client_secret
AZURE_SUBSCRIPTION_ID = var.context.azure_subscription_id
AZURE_TENANT_ID = var.context.azure_tenant_id
AZURE_RESOURCE_GROUP = var.context.azure_dns_rg
}
}
}
Pypiserver Ingress Resource
resource "kubernetes_ingress" "pypi" {
metadata {
name = "pypi"
namespace = kubernetes_namespace.pypi.metadata[0].name
annotations = {
"kubernetes.io/ingress.class" = "inet"
"kubernetes.io/tls-acme" = "true"
"certmanager.k8s.io/cluster-issuer" = "letsencrypt-prod"
"ingress.kubernetes.io/ssl-redirect" = "true"
}
}
spec {
tls {
hosts = [var.domain_address]
}
rule {
host = var.domain_address
http {
path {
path = "/"
backend {
service_name = kubernetes_service.pypi.metadata[0].name
service_port = "http"
}
}
}
}
}
}
Let me know if more info is required, and I will update my question text with whatever is missing. And lastly I will let the terraform code git repo stay up and serve as help for others.
The answer to my question was that I had to include a cert-manager to my cluster and as far as I can tell there are no native terraform resources to create it. I ended up using Helm for my ingress and cert manager.
The setup ended up a bit more complex than I initially imagined, and as it stands now it needs to be run twice. This is due to the kubeconfig not being updated (have to apply "set KUBECONFIG=.kubeconfig" before running "terraform apply" a second time). So it's not pretty, but it "works" as a minimum example to get your deployment up and running.
There definitively are ways of simplifying the pypi deployment part using native terraform resources, and there is probably an easy fix to the kubeconfig not being updated. But I have not had time to investigate further.
If anyone have tips for a more elegant, functional and (probably most of all) secure minimum terraform setup for a k8s cluster I would love to hear it!
Anyways, for those interested, the resulting terraform code can be found here