Nextflow doesn't use the right service account to deploy workflows to kubernetes - kubernetes

We're trying to use nextflow on a k8s namespace other than our default, the namespace we're using is nextflownamespace. We've created our PVC and ensured the default service account has an admin rolebinding. We're getting an error that nextflow can't access the PVC:
"message": "persistentvolumeclaims \"my-nextflow-pvc\" is forbidden:
User \"system:serviceaccount:mynamespace:default\" cannot get resource
\"persistentvolumeclaims\" in API group \"\" in the namespace \"nextflownamespace\"",
In that error we see that system:serviceaccount:mynamespace:default is incorrectly pointing to our default namespace, mynamespace, not nextflownamespace which we created for nextflow use.
We tried adding debug.yaml = true to our nextflow.config but couldn't find the YAML it submits to k8s to validate the error. Our config file looks like this:
profiles {
standard {
k8s {
executor = "k8s"
namespace = "nextflownamespace"
cpus = 1
memory = 1.GB
debug.yaml = true
}
aws{
endpoint = "https://s3.nautilus.optiputer.net"
}
}
We did verify that when we change the namespace to another arbitrary value the error message used the new arbitrary namespace, but the service account name continued to point to the users default namespace erroneously.
We've tried every variant of profiles.standard.k8s.serviceAccount = "system:serviceaccount:nextflownamespace:default" that we could think of but didn't get any change with those attempts.

I think it's best to avoid using nested config profiles with Nextflow. I would either remove the 'standard' layer from your profile or just make 'standard' a separate profile:
profiles {
standard {
process.executor = 'local'
}
k8s {
executor = "k8s"
namespace = "nextflownamespace"
cpus = 1
memory = 1.GB
debug.yaml = true
}
aws{
endpoint = "https://s3.nautilus.optiputer.net"
}
}

Related

What does this error mean when trying to use an AppRole from Vault on an Ingress deployment?

Context
We were trying to fix an inconsistency between Terraform and our cloud provider because a database was deleted through the cloud's UI console and the changes were not properly imported into Terraform.
For reasons we preferred to not do terraform import and proceeded to change the state file to remove all references to that database hoping that would allow us to run things like plan, and it did work, but we came across other issues...
Oh, I should add that we run things like Helm through Terraform to set up our Kubernetes infra as well.
The problem
Now Terraform makes a plan to remove a Google Container Node Pool (desired outcome) and to update a Kubernetes resource of kind Ingress. The latter change is not really intended, although it could be because there's a Terraform module dependency between the module that sets up all the cluster (including node pools) and the module that sets up Ingress.
Now the issue comes from updating that Ingress. Here's the plan:
# Terraform will read AppRole from Vault
data "vault_approle_auth_backend_role_id" "role" {
- backend = "approle" -> null
~ id = "auth/approle/role/nginx-ingress/role-id" -> (known after apply)
~ role_id = "<some UUID>" -> (known after apply)
role_name = "nginx-ingress"
}
# Now this is the resource that makes everything blow up
resource "helm_release" "nginx-ingress" {
atomic = false
chart = ".terraform/modules/nginx-ingress/terraform/../helm"
...
...
- set_sensitive {
- name = "appRole.roleId" -> null
- value = (sensitive value)
}
+ set_sensitive {
+ name = "appRole.roleId"
+ value = (sensitive value)
}
- set_sensitive {
- name = "appRole.secretId" -> null
- value = (sensitive value)
}
+ set_sensitive {
+ name = "appRole.secretId"
+ value = (sensitive value)
}
}
And here's the error message we get:
When expanding the plan for module.nginx-ingress.helm_release.nginx-ingress to
include new values learned so far during apply, provider
"registry.terraform.io/hashicorp/helm" produced an invalid new value for
.set_sensitive: planned set element
cty.ObjectVal(map[string]cty.Value{"name":cty.StringVal("appRole.secretId"),
"type":cty.NullVal(cty.String),
"value":cty.StringVal("<some other UUID>")}) does not
correlate with any element in actual.
This is a bug in the provider, which should be reported in the provider's own
issue tracker.
What we tried
We thought that maybe the AppRole's secretId had rotated or changed, so we took the secretId from the State of another environment that uses the same AppRole from the same Vault and set it in our modified state file. That didn't work.

fetch and update particular field using terraform

i have a scenario,
How to fetch particular field value and also update particular field value?
For example :
Im deploying an application using terraform "kubernetes_deployment" resource configured with environment variables(endpoint=abc) and replicas=2.
resource "kubernetes_deployment" “app” {
…..….
spec {
replicas = 2
template {
spec {
….
env {
name = “ENDPOINT”
value = “abc”
}
}
Once i deployed using terraform script, another script might change configurations replicas=5 and environment values(endpoint=xyz)
Now i need to update only replicas to 20(if replicas < 20) through terraform script without changing the environment values(endpoint=abc)?
resource "kubernetes_deployment" “app” {
…..….
spec {
replicas = 20 -> only this has to reflect in apply
template {
spec {
….
env {
name = “ENDPOINT”
value = “abc”
}
}
How can i fetch particular field(replicas) to compare if replicas count > 20 and update only replicas count?
Can someone with more Terraform experience help me on this?
Inside the "kubernetes_deployment" resource block, consider adding a lifecycle block. Use it to ignore changes to resource attributes that can be made outside of Terraform's knowledge.
Provide a list of resource attributes to "ignore_changes", which Terrform would ignore in subsequent runs. The arguments are the relative address of the attributes in the resource. Map and list elements can be referenced using index notation.
lifecycle {
ignore_changes = [spec["env"]]
}
Reference: https://www.terraform.io/docs/language/meta-arguments/lifecycle.html#ignore_changes

Create GKE cluster and namespace with Terraform

I need to create GKE cluster and then create namespace and install db through helm to that namespace. Now I have gke-cluster.tf that creates cluster with node pool and helm.tf, that has kubernetes provider and helm_release resource. It first creates cluster, but then tries to install db but namespace doesn't exist yet, so I have to run terraform apply again and it works. I want to avoid scenario with multiple folder and run terraform apply only once. What's the good practice for situaction like this? Thanks for the answers.
The create_namespace argument of helm_release resource can help you.
create_namespace - (Optional) Create the namespace if it does not yet exist. Defaults to false.
https://registry.terraform.io/providers/hashicorp/helm/latest/docs/resources/release#create_namespace
Alternatively, you can define a dependency between the namespace resource and helm_release like below:
resource "kubernetes_namespace" "prod" {
metadata {
annotations = {
name = "prod-namespace"
}
labels = {
namespace = "prod"
}
name = "prod"
}
}
resource "helm_release" "arango-crd" {
name = "arango-crd"
chart = "./kube-arangodb-crd"
namespace = "prod"
depends_on = [ kubernetes_namespace.prod ]
}
The solution posted by user adp is correct but I wanted to give more insight on using Terraform for this particular example in regards of running single commmand:
$ terraform apply --auto-approve.
Basing on following comments:
Can you tell how are you creating your namespace? Is it with kubernetes provider? - Dawid Kruk
resource "kubernetes_namespace" - Jozef Vrana
This setup needs specific order of execution. First the cluster, then the resources. By default Terraform will try to create all of the resources at the same time. It is crucial to use a parameter depends_on = [VALUE].
The next issue is that the kubernetes provider will try to fetch the credentials at the start of the process from ~/.kube/config. It will not wait for the cluster provisioning to get the actual credentials. It could:
fail when there is no .kube/config
fetch credentials for the wrong cluster.
There is ongoing feature request to resolve this kind of use case (also there are some workarounds):
Github.com: Hashicorp: Terraform: Issue: depends_on for providers
As an example:
# Create cluster
resource "google_container_cluster" "gke-terraform" {
project = "PROJECT_ID"
name = "gke-terraform"
location = var.zone
initial_node_count = 1
}
# Get the credentials
resource "null_resource" "get-credentials" {
depends_on = [google_container_cluster.gke-terraform]
provisioner "local-exec" {
command = "gcloud container clusters get-credentials ${google_container_cluster.gke-terraform.name} --zone=europe-west3-c"
}
}
# Create a namespace
resource "kubernetes_namespace" "awesome-namespace" {
depends_on = [null_resource.get-credentials]
metadata {
name = "awesome-namespace"
}
}
Assuming that you had earlier configured cluster to work on and you didn't delete it:
Credentials for Kubernetes cluster are fetched.
Terraform will create a cluster named gke-terraform
Terraform will run a local command to get the credentials for gke-terraform cluster
Terraform will create a namespace (using old information):
if you had another cluster in .kube/config configured, it will create a namespace in that cluster (previous one)
if you deleted your previous cluster, it will try to create a namespace in that cluster and fail (previous one)
if you had no .kube/config it will fail on the start
Important!
Using "helm_release" resource seems to get the credentials when provisioning the resources, not at the start!
As said you can use helm provider to provision the resources on your cluster to avoid the issues I described above.
Example on running a single command for creating a cluster and provisioning resources on it:
variable zone {
type = string
default = "europe-west3-c"
}
resource "google_container_cluster" "gke-terraform" {
project = "PROJECT_ID"
name = "gke-terraform"
location = var.zone
initial_node_count = 1
}
data "google_container_cluster" "gke-terraform" {
project = "PROJECT_ID"
name = "gke-terraform"
location = var.zone
}
resource "null_resource" "get-credentials" {
# do not start before resource gke-terraform is provisioned
depends_on = [google_container_cluster.gke-terraform]
provisioner "local-exec" {
command = "gcloud container clusters get-credentials ${google_container_cluster.gke-terraform.name} --zone=${var.zone}"
}
}
resource "helm_release" "mydatabase" {
name = "mydatabase"
chart = "stable/mariadb"
# do not start before the get-credentials resource is run
depends_on = [null_resource.get-credentials]
set {
name = "mariadbUser"
value = "foo"
}
set {
name = "mariadbPassword"
value = "qux"
}
}
Using above configuration will yield:
data.google_container_cluster.gke-terraform: Refreshing state...
google_container_cluster.gke-terraform: Creating...
google_container_cluster.gke-terraform: Still creating... [10s elapsed]
<--OMITTED-->
google_container_cluster.gke-terraform: Still creating... [2m30s elapsed]
google_container_cluster.gke-terraform: Creation complete after 2m38s [id=projects/PROJECT_ID/locations/europe-west3-c/clusters/gke-terraform]
null_resource.get-credentials: Creating...
null_resource.get-credentials: Provisioning with 'local-exec'...
null_resource.get-credentials (local-exec): Executing: ["/bin/sh" "-c" "gcloud container clusters get-credentials gke-terraform --zone=europe-west3-c"]
null_resource.get-credentials (local-exec): Fetching cluster endpoint and auth data.
null_resource.get-credentials (local-exec): kubeconfig entry generated for gke-terraform.
null_resource.get-credentials: Creation complete after 1s [id=4191245626158601026]
helm_release.mydatabase: Creating...
helm_release.mydatabase: Still creating... [10s elapsed]
<--OMITTED-->
helm_release.mydatabase: Still creating... [1m40s elapsed]
helm_release.mydatabase: Creation complete after 1m44s [id=mydatabase]

Specify namespace when creating kubernetes PV/PVC with Terraform

I am trying to create PV/PVC on a kubernetes GKE cluster using Terraform
However the documentation does not mention how can one specify the namespace that these resources should be created in.
I have tried adding it both in the spec and the metadata section but I get an error message:
resource "kubernetes_persistent_volume" "jenkins-persistent-volume" {
metadata {
name = "${var.kubernetes_persistent_volume_metadata_name}"
# tried placing it here -->> namespace = "${var.kubernetes_jenkins_namespace}"
}
spec {
# tried placing it here -->> namespace = "${var.kubernetes_jenkins_namespace}"
capacity = {
storage = "${var.kubernetes_persistent_volume_spec_capacity_storage}"
}
storage_class_name = "standard"
access_modes = ["ReadWriteMany"]
persistent_volume_source {
gce_persistent_disk {
fs_type = "ext4"
pd_name = "${google_compute_disk.jenkins-disk.name}"
}
}
}
}
Error: module.jenkins.kubernetes_persistent_volume.jenkins-persistent-volume: spec.0: invalid or unknown key: namespace
Where such a configuration be placed?
Persistent volumes are cluster-global objects and do not live in specific namespaces. ("It is a resource in the cluster just like a node is a cluster resource.") Correspondingly you can't include a namespace name anywhere on a kubernetes_persistent_volume resource.
If you're running in a cloud environment (and here your PV is creating a Google storage volume) its typical to only create a persistent volume claim, and let the cluster allocate the underlying volume for you. PVCs are namespace-scoped, and the Terraform kubernetes_persistent_volume_claim resource explicitly documents that you can include a namespace in the metadata block.

How to issue letsencrypt certificate for k8s (AKS) using terraform resources?

Summary
I am unable to issue a valid certificate for my terraform kubernetes cluster on azure aks. The domain and certificate is successfully created (cert is created according to crt.sh), however the certificate is not applied to my domain and my browser reports "Kubernetes Ingress Controller Fake Certificate" as the applied certificate.
The terraform files are converted to the best of my abilities from a working set of yaml files (that issues certificates just fine). See my terraform code here.
UPDATE! In the original question I was also unable to create certificates. This was fixed by using the "tls_cert_request" resource from here. The change is included in my updated code below.
Here a some things I have checked out and found NOT to be the issue
The number of issued certificates from acme letsencrypt is not above rate-limits for either staging or prod.
I get the same "Fake certificate" error using both staging or prod certificate server.
Here are some areas that I am currently investigating as potential sources for the error.
I do not see a terraform-equivalent of the letsencrypt yaml input "privateKeySecretRef" and consequently what the value of my deployment ingress "certmanager.k8s.io/cluster-issuer" should be.
If anyone have any other suggestions, I would really appreciate to hear them (as this has been bugging me for quite some time now)!
Certificate Resources
provider "acme" {
server_url = var.context.cert_server
}
resource "tls_private_key" "reg_private_key" {
algorithm = "RSA"
}
resource "acme_registration" "reg" {
account_key_pem = tls_private_key.reg_private_key.private_key_pem
email_address = var.context.email
}
resource "tls_private_key" "cert_private_key" {
algorithm = "RSA"
}
resource "tls_cert_request" "req" {
key_algorithm = "RSA"
private_key_pem = tls_private_key.cert_private_key.private_key_pem
dns_names = [var.context.domain_address]
subject {
common_name = var.context.domain_address
}
}
resource "acme_certificate" "certificate" {
account_key_pem = acme_registration.reg.account_key_pem
certificate_request_pem = tls_cert_request.req.cert_request_pem
dns_challenge {
provider = "azure"
config = {
AZURE_CLIENT_ID = var.context.client_id
AZURE_CLIENT_SECRET = var.context.client_secret
AZURE_SUBSCRIPTION_ID = var.context.azure_subscription_id
AZURE_TENANT_ID = var.context.azure_tenant_id
AZURE_RESOURCE_GROUP = var.context.azure_dns_rg
}
}
}
Pypiserver Ingress Resource
resource "kubernetes_ingress" "pypi" {
metadata {
name = "pypi"
namespace = kubernetes_namespace.pypi.metadata[0].name
annotations = {
"kubernetes.io/ingress.class" = "inet"
"kubernetes.io/tls-acme" = "true"
"certmanager.k8s.io/cluster-issuer" = "letsencrypt-prod"
"ingress.kubernetes.io/ssl-redirect" = "true"
}
}
spec {
tls {
hosts = [var.domain_address]
}
rule {
host = var.domain_address
http {
path {
path = "/"
backend {
service_name = kubernetes_service.pypi.metadata[0].name
service_port = "http"
}
}
}
}
}
}
Let me know if more info is required, and I will update my question text with whatever is missing. And lastly I will let the terraform code git repo stay up and serve as help for others.
The answer to my question was that I had to include a cert-manager to my cluster and as far as I can tell there are no native terraform resources to create it. I ended up using Helm for my ingress and cert manager.
The setup ended up a bit more complex than I initially imagined, and as it stands now it needs to be run twice. This is due to the kubeconfig not being updated (have to apply "set KUBECONFIG=.kubeconfig" before running "terraform apply" a second time). So it's not pretty, but it "works" as a minimum example to get your deployment up and running.
There definitively are ways of simplifying the pypi deployment part using native terraform resources, and there is probably an easy fix to the kubeconfig not being updated. But I have not had time to investigate further.
If anyone have tips for a more elegant, functional and (probably most of all) secure minimum terraform setup for a k8s cluster I would love to hear it!
Anyways, for those interested, the resulting terraform code can be found here