AWS CDK Stack is deployed silently even if not explicitly specified - aws-cloudformation

I have multi stacks CDK setup, the core stack contains the VPC and EKS. The "Tenant" stack that deploys some s3 buckets and k8s namespaces and some other tenant related deployments.
cdk ls is displaying all the existing stacks as expected.
- eks-stack
- tenant-a
- tenant-b
If I want to deploy only a single tenant stack, I run cdk deploy tenant-a. To my surprise, I see that in my k8s cluster, the manifest of tenant-1 and tenant-b were deployed, and not just tenant-a as I expected.
The CDK output on the CLI correctly outputs that tenant-a was deployed. The CLI output doesn't mention ⁣tenant-b. I also see that most of the changes did happen inside the eks stack and not in the tenant stack, as I am using the references.
# app.py
# ...
# EKS
efs_stack = EksStack(
app,
"eks-stack",
stack_log_level="INFO",
)
# Tenant Specific stacks
tenants = ['tenant-a', 'tenant-b']
for tenant in tenants:
tenant_stack = TenantStack(
app,
f"tenant-stack-{tenant}",
stack_log_level="INFO",
cluster=eks_cluster_stack.eks_cluster,
tenant=tenant,
)
--
#
# Inside TenantStack.py a manifest is applied to k8s
self.cluster.add_manifest(f'db-job-{self.tenant}', {
"apiVersion": 'v1',
"kind": 'Pod',
"metadata": {"name": 'mypod'},
"spec": {
"serviceAccountName": "bootstrap-db-job-access-ssm",
"containers": [
{
"name": 'hello',
"image": 'amazon/aws-cli',
"command": 'magic stuff ....'
}
]
}
})
I found out that when I import a cluster by its attributes and by reference
eg.
self.cluster = Cluster.from_cluster_attributes(
self, 'cluster', cluster_name=cluster,
open_id_connect_provider=eks_open_id_connect_provider,
kubectl_role_arn=kubectl_role
I can deploy tenant stack a and b separately and my core eks stack stays untouched. Now I have read It's recommended to use references as CDK can automatically can create dependencies and detect circular dependencies.

There is an option to exclude dependencies. Use cdk deploy tenant-a --exclusively to don't deploy dependencies.

Related

Why isn't my `KubernetesPodOperator` using the IRSA I've annotated worker pods with?

I've deployed an EKS cluster using the Terraform module terraform-aws-modules/eks/aws. I’ve deployed Airflow on this EKS cluster with Helm (the official chart, not the community one), and I’ve annotated worker pods with the following IRSA:
serviceAccount:
# Specifies whether a ServiceAccount should be created
create: true
# The name of the ServiceAccount to use.
# If not set and create is true, a name is generated using the release name
name: "airflow-worker"
# Annotations to add to worker kubernetes service account.
annotations:
eks.amazonaws.com/role-arn: "arn:aws:iam::123456789:role/airflow-worker"
This airflow-worker role has a policy attached to it to enable it to assume a different role.
I have a Python program that assumes this other role and performs some S3 operations. I can exec into a running BashOperator pod, open a Python shell, assume this role, and issue the exact same S3 operations successfully.
But, when I create a Docker image with this program and try to call it from a KubernetesPodOperator task, I see the following error:
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the AssumeRole operation:
User: arn:aws:sts::123456789:assumed-role/core_node_group-eks-node-group-20220726041042973200000001/i-089c64b96cf7878d8 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::987654321:role/TheOtherRole
I don't really know what this role is, but I believe it was created automatically by the Terraform module. However, when I kubectl describe one of these failed pods, I see this:
Environment:
...
...
...
AWS_ROLE_ARN: arn:aws:iam::123456789:role/airflow-worker
My questions:
Why is this role being used, and not the IRSA airflow-worker that I've specified in the Helm chart's values?
What even is this role? It seems the Terraform module creates a number of roles automatically, but it is very difficult to tell what their purpose is or where they're used from the Terraform documentation.
How am I able to assume this role and do everything the Dockerized Python program does when in a shell in the pod? Okay, this is because other operators (such as BashOperator) do use the airflow-worker role. Just not KubernetesPodOperators.
What is the AWS_IAM_ROLE environment variable, and why isn't it being used?
Happy to provide more context if it's helpful.
In order to use the AWS role in EKS pod, you need to add this policy to it:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": " arn:aws:iam::123456789:role/airflow-worker”
},
"Action": "sts:AssumeRole"
}
]
}
Here you can find some information about AWS Security Token Service (STS).
For the tasks running in the worker prod, they will use the role automatically, but if you create a new pod, it will be separated from your worker pod, so you need to let it use the service account which attach the role in order to add the AWS role creds file to the pod.
This is pretty much by design. The non-KubernetesPodOperators use an auto-generated pod template file that has Helm chart values as default properties, while the KubernetesPodOperator needs its own pod template file. That, or it needs to essentially create one by passing arguments to KubernetesPodOperator(....
I fixed the ultimate issue by passing service_account="airflow-worker" to KubernetesPodOperator(....

Add existing GKE cluster to terraform stat file

Lets assume I have an existing GKE cluster that contains all my applications. They were all deployed using different methods. Now I want to deploy some resources to that cluster using Terraform. The trouble here is that terraform doesn't see it in his state file so it can't interact with it. Another problem is that even if I get that cluster to my state file, terraform doesn't see all of the created resources in that cluster. This could lead to some conflicts e.g. I'm trying to deploy two resources with the same name. Is there a way to solve this problem or do I just have to deal with the reality of my existence and create a new cluster for every new project that I deploy with terraform?
You can use terraform import command to import your existing GKE cluster to terraform state. Prior to run it, you need to have the adequate terraform configuration for your cluster.
example of import command :
terraform import google_container_cluster.<TF_RESOURCE_NAME> projects/<PROJECT_ID>/locations/<YOUR-CLUSTER-ZONE>/clusters/<CLUSTER_NAME>
for a terraform configuration :
resource "google_container_cluster" "<TF_RESOURCE_NAME>" {
name = "<CLUSTER_NAME>"
location = "<YOUR-CLUSTER-ZONE>"
}
The CLUSTER_NAME is the name displayed in your GKE clusters list on Google Cloud Console.
Then you need also to import the cluster node pool(s) in the same way using terraform google_container_node_pool resource.

Kubernetes not updates Docker image on redeployment [duplicate]

This question already has answers here:
Kubernetes how to make Deployment to update image
(8 answers)
Closed 2 years ago.
Automation builds Docker image with microservice and push this image into JFrog Artifactory registry labeled by branch name registry/service-name:branch. At the next step it applies Kubernetes yaml manifest file and application starts after pulling image at the appropriate Kubernetes node.
The problem is following - when I push changes in microservice source code into repository the automation starts:
rebuild the project and push updated docker image into registry with the same label(branch)
redeploy the microservice in Kubernetes
microservice redeployed but with old image
I guess it is occurs because there are no changes in 'Deployment' section of Kubernetes yaml manifest file and Kubernetes not pull updated image from JFrog registry. As workaround, I implement inserting timestamp annotation into template section on each redeployment:
"template": {
"metadata": {
"labels": {
"app": "service-name"
},
"annotations": {
"timestamp": "1588246422"
But miracle is not happened - image updated only when I delete Kubernetes deployment and redeploy the application (may be in this case it just starts at the another node and docker pull is necessary).
Is it possible to setup Kubernetes or configure manifest file some how to force Kubernetes pull image on each redeployment?
I would suggest labeling the images in the pattern of registry/service-name:branch-git-sha or registry/service-name:git-sha which will pull the images automatically.
Or as a workaround, you can keep the current image labeling system and add an environment variable in the template which gets sets to the timestamp.
Changing the environment variable will always result in the restarting of the pods along with the config imagePullPolicy: Always.

kubefed init says "waiting for the federation control plane to come up" and it never comes up

I've created clusters using kops command. For each cluster I've to create a hosted zone and add namespaces to DNS provider. To create a hosted zone, I've created a sub-domain in the hosted zone in aws(example.com) by using the following command :
ID=$(uuidgen) && aws route53 create-hosted-zone --name subdomain1.example.com --caller-reference $ID | jq .DelegationSet.NameServers
The nameservers I get by executing above command are included in a newly created file subdomain1.json with the following content.
{
"Comment": "Create a subdomain NS record in the parent domain",
"Changes": [
{
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "subdomain1.example.com",
"Type": "NS",
"TTL": 300,
"ResourceRecords": [
{
"Value": "ns-1.awsdns-1.co.uk"
},
{
"Value": "ns-2.awsdns-2.org"
},
{
"Value": "ns-3.awsdns-3.com"
},
{
"Value": "ns-4.awsdns-4.net"
}
]
}
}
]
}
To get the parent-zone-id, I've used the following command:
aws route53 list-hosted-zones | jq '.HostedZones[] | select(.Name=="example.com.") | .Id'
To apply the subdomain NS records to the parent hosted zone-
aws route53 change-resource-record-sets --hosted-zone-id <parent-zone-id> --change-batch file://subdomain1.json
then I created a cluster using kops command-
kops create cluster --name=subdomain1.example.com --master-count=1 --master-zones ap-southeast-1a --node-count=1 --zones=ap-southeast-1a --authorization=rbac --state=s3://example.com --kubernetes-version=1.11.0 --yes
I'm able to create a cluster, validate it and get its nodes. By using the same procedure, I created one more cluster (subdomain2.example.com).
I've set aliases for the two clusters using these commands-
kubectl config set-context subdomain1 --cluster=subdomain1.example.com --user=subdomain1.example.com
kubectl config set-context subdomain2 --cluster=subdomain2.example.com --user=subdomain2.example.com
To set up federation between these two clusters, I've used these commands-
kubectl config use-context subdomain1
kubectl create clusterrolebinding admin-to-cluster-admin-binding --clusterrole=cluster-admin --user=admin
kubefed init interstellar --host-cluster-context=subdomain1 --dns-provider=aws-route53 --dns-zone-name=example.com
-The output of kubefed init command should be
But for me it's showing as "waiting for the federation control plane to come up...", but it does not come up. What might be the error?
I've followed the following tutorial to create 2 clusters.
https://gist.github.com/arun-gupta/02f534c7720c8e9c9a875681b430441a
There was a problem with the default image used for federation api server and controller manager binaries. By default, the below mentioned image is considered for the kubefed init command-
"gcr.io/k8s-jkns-e2e-gce-federation/fcp-amd64:v0.0.0-master_$Format:%h$".
But this image is old and is not available, the federation control plane tries to pull the image but fails. This is the error I was getting.
To rectify it, build a fcp image of your own and push it to some repository and use this image in kubefed init command. Below are the instructions to be executed(Run all of these commands from this path "$GOPATH/src/k8s.io/kubernetes/federation")-
to create fcp image and push it to a repository -
docker load -i _output/release-images/amd64/fcp-amd64.tar
docker tag gcr.io/google_containers/fcp-amd64:v1.9.0-alpha.2.60_430416309f9e58-dirty REGISTRY/REPO/IMAGENAME[:TAG]
docker push REGISTRY/REPO/IMAGENAME[:TAG]
now create a federation control plane with the following command-
_output/dockerized/bin/linux/amd64/kubefed init myfed --host-cluster-context=HOST_CLUSTER_CONTEXT --image=REGISTRY/REPO/IMAGENAME[:TAG] --dns-provider="PROVIDER" --dns-zone-name="YOUR_ZONE" --dns-provider-config=/path/to/provider.conf

terraforming with dependant providers

In my terraform infrastructure, I spin up several Kubernetes clusters based on parameters, then install some standard contents to those Kubernetes clusters using the kubernetes provider.
When I change the parameters and one of the clusters is no longer needed, terraform is unable to tear it down because the provider and resources are both in the module. I don't see an alternative, however, because I create the kubernetes cluster in that same module, and the kubernetes object are all per kubernetes cluster.
All solutions I can think of involve adding a bunch of boilerplate to my terraform config. Should I consider generating my terraform config from a script?
I made a git repo that shows exactly the problems I'm having:
https://github.com/bukzor/terraform-gke-k8s-demo
TL;DR
Two solutions:
Create two separate modules with Terraform
Use interpolations and depends_on between the code that creates your Kubernetes cluster and the kubernetes resources:
resource "kubernetes_service" "example" {
metadata {
name = "my-service"
}
depends_on = ["aws_vpc.kubernetes"]
}
resource "aws_vpc" "kubernetes" {
...
}
When destroying resources
You are encountering a dependency lifecycle issue
PS: I don't know the code you've used to create / provision your Kubernetes cluster but I guess it looks like this
Write code for the Kubernetes cluster (creates a VPC)
Apply it
Write code for provisionning Kubernetes (create an Service that creates an ELB)
Apply it
Try to destroy everything => Error
What is happenning is that by creating a LoadBalancer Service, Kubernetes will provision an ELB on AWS. But Terraform doesn't know that and there is no link between the ELB created and any other resources managed by Terraform.
So when terraform tries to destroy the resources in the code, it will try to destroy the VPC. But it can't because there is an ELB inside that VPC that terraform doesn't know about.
The first thing would be to make sure that Terraform "deprovision" the Kubernetes cluster and then destroy the cluster itself.
Two solutions here:
Use different modules so there is no dependency lifecycle. For example the first module could be k8s-infra and the other could be k8s-resources. The first one manages all the squeleton of Kubernetes and is apply first / destroy last. The second one manages what is inside the cluster and is apply last / destroy first.
Use the depends_on parameter to write the dependency lifecycle explicitly
When creating resources
You might also ran into a dependency issue when terraform apply cannot create resources even if nothing is applied yet. I'll give an other example with a postgres
Write code to create an RDS PostgreSQL server
Apply it with Terraform
Write code, in the same module, to provision that RDS instance with the postgres terraform provider
Apply it with Terraform
Destroy everything
Try to apply everything => ERROR
By debugging Terraform a bit I've learned that all the providers are initialized at the beggining of the plan / apply so if one has an invalid config (wrong API keys / unreachable endpoint) then Terraform will fail.
The solution here is to use the target parameter of a plan / apply command.
Terraform will only initialize providers that are related to the resources that are applied.
Apply the RDS code with the AWS provider: terraform apply -target=aws_db_instance
Apply everything terraform apply. Because the RDS instance is already reachable, the PostgreSQL provider can also initiate itself