terraforming with dependant providers - kubernetes

In my terraform infrastructure, I spin up several Kubernetes clusters based on parameters, then install some standard contents to those Kubernetes clusters using the kubernetes provider.
When I change the parameters and one of the clusters is no longer needed, terraform is unable to tear it down because the provider and resources are both in the module. I don't see an alternative, however, because I create the kubernetes cluster in that same module, and the kubernetes object are all per kubernetes cluster.
All solutions I can think of involve adding a bunch of boilerplate to my terraform config. Should I consider generating my terraform config from a script?
I made a git repo that shows exactly the problems I'm having:
https://github.com/bukzor/terraform-gke-k8s-demo

TL;DR
Two solutions:
Create two separate modules with Terraform
Use interpolations and depends_on between the code that creates your Kubernetes cluster and the kubernetes resources:
resource "kubernetes_service" "example" {
metadata {
name = "my-service"
}
depends_on = ["aws_vpc.kubernetes"]
}
resource "aws_vpc" "kubernetes" {
...
}
When destroying resources
You are encountering a dependency lifecycle issue
PS: I don't know the code you've used to create / provision your Kubernetes cluster but I guess it looks like this
Write code for the Kubernetes cluster (creates a VPC)
Apply it
Write code for provisionning Kubernetes (create an Service that creates an ELB)
Apply it
Try to destroy everything => Error
What is happenning is that by creating a LoadBalancer Service, Kubernetes will provision an ELB on AWS. But Terraform doesn't know that and there is no link between the ELB created and any other resources managed by Terraform.
So when terraform tries to destroy the resources in the code, it will try to destroy the VPC. But it can't because there is an ELB inside that VPC that terraform doesn't know about.
The first thing would be to make sure that Terraform "deprovision" the Kubernetes cluster and then destroy the cluster itself.
Two solutions here:
Use different modules so there is no dependency lifecycle. For example the first module could be k8s-infra and the other could be k8s-resources. The first one manages all the squeleton of Kubernetes and is apply first / destroy last. The second one manages what is inside the cluster and is apply last / destroy first.
Use the depends_on parameter to write the dependency lifecycle explicitly
When creating resources
You might also ran into a dependency issue when terraform apply cannot create resources even if nothing is applied yet. I'll give an other example with a postgres
Write code to create an RDS PostgreSQL server
Apply it with Terraform
Write code, in the same module, to provision that RDS instance with the postgres terraform provider
Apply it with Terraform
Destroy everything
Try to apply everything => ERROR
By debugging Terraform a bit I've learned that all the providers are initialized at the beggining of the plan / apply so if one has an invalid config (wrong API keys / unreachable endpoint) then Terraform will fail.
The solution here is to use the target parameter of a plan / apply command.
Terraform will only initialize providers that are related to the resources that are applied.
Apply the RDS code with the AWS provider: terraform apply -target=aws_db_instance
Apply everything terraform apply. Because the RDS instance is already reachable, the PostgreSQL provider can also initiate itself

Related

How to make Terraform provider dependent on a resource being created

I am trying to utilize Rancher Terraform provider to create a new RKE cluster and then use the Kubernetes and Helm Terraform providers to create/deploy resources to the created cluster. I'm using this https://registry.terraform.io/providers/rancher/rancher2/latest/docs/resources/cluster_v2#kube_config attribute to create a local file with the new cluster's kube config.
The Helm and Kubernetes providers need the kube config in the provider configuration: https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs. Is there any way I can get the provider configuration to wait for the local file being created?
Generally speaking, Terraform always needs to evaluate provider configurations during the planning step because providers are allowed to rely on those settings in order to create the plan, and so it typically isn't possible to have a provider configuration refer to something created only during the apply step.
As a way to support bootstrapping in a situation like this though, this is one situation where it can be reasonable to use the -target=... option to terraform apply, to plan and apply only sufficient actions to create the Rancher cluster first, and then follow up with a normal plan and apply to complete everything else:
terraform apply -target=rancher2_cluster_v2.example
terraform apply
This two-step process is needed only for situations where the kube_config attribute isn't known yet. As long as this resource type has convergent behavior, you should be able to use just terraform apply as normal unless you in future make a change that requires replacing the cluster.
(This is a general answer about provider configurations refering to resource attributes. I'm not familiar with Rancher in particular and so there might be some specifics about that particular resource type which I'm not mentioning here.)
I found a sort of workaround solution. I output the rancher2_cluster.cluster.kube_config object into a variable. Then referenced that variable in my Kubernetes module. Instead of using kube_config attribute in the provider configuration, I used the token and host attributes and used yamldecode to parse the creds directly from the kube_config variable.
provider "kubernetes" {
token = "${yamldecode(var.kube_config)["users"][0]["user"]["token"]}"
host = "${yamldecode(var.kube_config)["clusters"][0]["cluster"]["server"]}"
}
I will suggest to split your functionality in 2 layers
Run the fist layer to generate the kube_config file.
Run the second layer that will consume this file.

Add existing GKE cluster to terraform stat file

Lets assume I have an existing GKE cluster that contains all my applications. They were all deployed using different methods. Now I want to deploy some resources to that cluster using Terraform. The trouble here is that terraform doesn't see it in his state file so it can't interact with it. Another problem is that even if I get that cluster to my state file, terraform doesn't see all of the created resources in that cluster. This could lead to some conflicts e.g. I'm trying to deploy two resources with the same name. Is there a way to solve this problem or do I just have to deal with the reality of my existence and create a new cluster for every new project that I deploy with terraform?
You can use terraform import command to import your existing GKE cluster to terraform state. Prior to run it, you need to have the adequate terraform configuration for your cluster.
example of import command :
terraform import google_container_cluster.<TF_RESOURCE_NAME> projects/<PROJECT_ID>/locations/<YOUR-CLUSTER-ZONE>/clusters/<CLUSTER_NAME>
for a terraform configuration :
resource "google_container_cluster" "<TF_RESOURCE_NAME>" {
name = "<CLUSTER_NAME>"
location = "<YOUR-CLUSTER-ZONE>"
}
The CLUSTER_NAME is the name displayed in your GKE clusters list on Google Cloud Console.
Then you need also to import the cluster node pool(s) in the same way using terraform google_container_node_pool resource.

How to include AWS EKS with CI/CD?

I am studying about CI/CD on AWS (CodePipeline/CodeBuild/CodeDeploy) and found it to be a very good tool for managing a pipeline on the cloud with everything managed (don't even need to install Jenkins on EC2).
I am now reading about container building and deployment. For the build phase, CodeBuild supports building container images. For the deploy phase, while I could find a CodeDeploy solution to ECS cluster, it seems there is no direct CodeDeploy solution for EKS (kindly correct if I am wrong).
May I know if there is a solution to integrate EKS cluster (i.e. the deploy phase can fetch the docker image from ECR or dockerhub and deploy to EKS)? I have come across some ideas using lamda functions to trigger the cluster to perform rolling update of the container image, but I could not find a step-by-step guide on this.
=========================
(Update 17 Sep 2020)
Somehow managed to create a lambda function to trigger an update to EKS to perform rolling update of the k8s deployment. Thanks Prashanna for the source base.
Just want to share the key setups in the process.
(1) Update the lambda execution role to include permission to describe EKS clusters
Create a policy with describe EKS cluster access, and attach to the role:
Policy snippet:
...
......
"Action": "eks:Describe*"
...
......
Or you can create a "EKSFullAccess" policy, and attach to the lambda execution role
(2) Update the k8s ConfigMap, and supplement the lambda execution role ARN to the mapRole section. The corresponding k8s role should be a role that has permission to update container images (say system:masters) used for the k8s deployment
You can edit the map with command like below:
kubectl edit -n kube-system configmap/aws-auth
You don't have to add/update another ConfigMap even if your deployment is in another namespace. It will take effect as well.
Sample lambda function call request and response:
Gitab provides the inbuilt integration of EKS and deployment with the help of Helm charts. If you plan to use other tools Using AWS lambda to update the image is the best bet!
I've added my github project.
Setup a lambda with below code and give RBAC access to this lambda in your EKS. Try invoking the lambda by passing the required information like namespace, deployment, image etc
Lambda for Kubernetes image update
The lambda must require EKS:describecluster policy.
The Lambda role must be provided atleast update image RBAC role in EKS cluster RBAC role setup
Since there's no built-in CI/CD for EKS at the moment, this is going to be a showcase of success/failure stories of a 3rd-party CI/CDs in EKS :) My take: https://github.com/fluxcd/flux
Pros:
Quick to set up initially (until you get into multiple teams/environments)
Tracks and deploys image releases out of box
Possibility to split what to auto-deploy in dev/prod using regex. E.g. all versions to dev, only minor to prod. Or separate tag prefixes for dev/prod.
All state is in git - a good practice to start with
Cons:
Getting complex for further pipeline expansion, e.g. blue-green, canary, auto-rollbacks, etc.
The dashboard is proprietary (weave works product)
Not for on-demand parametrized job runs like traditional CIs.
Setup:
Setup an automated image build (looks like you've already figured out)
Setup flux and helm-operator into the cluster, point them to your "gitops repo"
For each app, create a HelmRelease object that describes a regex of image tag to track
Done. A newly published image tag that falls into regex will be auto-deployed to the cluster and the new version is committed to a gitops repo.

Terraform apply, how to increment count and add kubernetes worker nodes to the existing workers?

I deployed k8s cluster on bare metal using terraform following this repository on github
Now I have three nodes:
ewr1-controller, ewr1-worker-0, ewr1-worker-1
Next, I would like to run terraform apply and increment the worker nodes (*ewr1-worker-3, ewr1-worker-4 ... *) while keeping the existing controller and worker nodes.
I tried incrementing the count.index to start from 3, however it still overwrites the existing workers.
resource "packet_device" "k8s_workers" {
project_id = "${packet_project.kubenet.id}"
facilities = "${var.facilities}"
count = "${var.worker_count}"
plan = "${var.worker_plan}"
operating_system = "ubuntu_16_04"
hostname = "${format("%s-%s-%d", "${var.facilities[0]}", "worker", count.index+3)}"
billing_cycle = "hourly"
tags = ["kubernetes", "k8s", "worker"]
}
I havent tried this but if I do
terraform state rm 'packet_device.k8s_workers'
I am assuming these worker nodes will not be managed by the kubernetes master. I don't want to create all the nodes at beginning because the worker nodes that I am adding will have different specs(instance types).
The entire script I used is available here on this github repository.
I appreciate it if someone could tell what I am missing here and how to achieve this.
Thanks!
Node resizing is best addressed using an autoscaler. Using Terraform to scale a nodepool might not be the optimal approach as the tool is meant to declare the state of the system rather than dynamically change it. The best approach for this is to use a cloud auto scaler.
In bare metal, you can implement a CloudProvider interface (like the one provided by cloud such as AWS, GCP, Azure) as described here
After implementing that, you need to determine if your K8s implementation can be operated as a provider by Terraform, and if that's the case, find the nodepool autoscaler resource that allows the autoscaling.
Wrapping up, Terraform is not meant to be used as an autoscaler given its natures as a declarative language that describes the infrastructure.
The autoscaling features in K8s are meant to tackle this kind of requirements.
I solved this issue by modifying and removing modules, resources from the terraform.state: manually modifying it and using terraform state rm <--->.
Leaving out (remove state) the section that I want to keep as they
are.
Modifying the sections in terraform.state that I want change when new servers are added.
Incrementing the counter to add new resources, see terraform
interpolation.
I am using bare-metal cloud provider to deploy k8s and it doesn't support k8s HA or VA autos-calling. This is may not be optimal solution as other have pointed out but if it is not something you need to do quite often, terraform can do the job albeit the hardway.

Automate Retrieval and Storing Kubeconfig File After Creating a Cluster with Terraform/GKE

When I use Terraform to create a cluster in GKE everything works fine and as expected.
After the cluster is created, I want to then use Terraform to deploy a workload.
My issue is, how to be able to point at the correct cluster, but I'm not sure I understand the best way of achieving this.
I want to automate the retrieval of the clusters kubeconfig file- the file which is generally stored at ~/.kube/config. This file is updated when users run this command manually to authenticate to the correct cluster.
I am aware if this file is stored on the host machine (the one I have Terraform running on) that it's possible to point at this file to authenticate to the cluster like so:
provider kubernetes {
# leave blank to pickup config from kubectl config of local system
config_path = "~/.kube/config"
}
However, running this command to generate the kubeconfig requires Cloud SDK to be installed on the same machine that Terraform is running on, and its manual execution doesn't exactly seem very elegant.
I am sure I must be missing something in how to achieve this.
Is there a better way to retrieve the kubeconfig file via Terraform from a cluster created by Terraform?
Actually, there is another way to access to fresh created gke.
data "google_client_config" "client" {}
provider "kubernetes" {
load_config_file = false
host = google_container_cluster.main.endpoint
cluster_ca_certificate = base64decode(google_container_cluster.main.master_auth.0.cluster_ca_certificate)
token = data.google_client_config.client.access_token
}
Basically in one step create you cluster. Export the kube config file to S3 for example.
In another step retrieve the file and move to the default folder. Terraform should work following this steps. Then you can apply your obejcts to cluster created previuoly.
I am deplyoing using gitlabCi pipeline, I have one repository code for k8s cluster (infra) and another with the k8s objects. The first pipeline triggers the second.