How to include AWS EKS with CI/CD?

How to include AWS EKS with CI/CD? - kubernetes

I am studying about CI/CD on AWS (CodePipeline/CodeBuild/CodeDeploy) and found it to be a very good tool for managing a pipeline on the cloud with everything managed (don't even need to install Jenkins on EC2).
I am now reading about container building and deployment. For the build phase, CodeBuild supports building container images. For the deploy phase, while I could find a CodeDeploy solution to ECS cluster, it seems there is no direct CodeDeploy solution for EKS (kindly correct if I am wrong).
May I know if there is a solution to integrate EKS cluster (i.e. the deploy phase can fetch the docker image from ECR or dockerhub and deploy to EKS)? I have come across some ideas using lamda functions to trigger the cluster to perform rolling update of the container image, but I could not find a step-by-step guide on this.
=========================
(Update 17 Sep 2020)
Somehow managed to create a lambda function to trigger an update to EKS to perform rolling update of the k8s deployment. Thanks Prashanna for the source base.
Just want to share the key setups in the process.
(1) Update the lambda execution role to include permission to describe EKS clusters
Create a policy with describe EKS cluster access, and attach to the role:
Policy snippet:
...
......
"Action": "eks:Describe*"
...
......
Or you can create a "EKSFullAccess" policy, and attach to the lambda execution role
(2) Update the k8s ConfigMap, and supplement the lambda execution role ARN to the mapRole section. The corresponding k8s role should be a role that has permission to update container images (say system:masters) used for the k8s deployment
You can edit the map with command like below:
kubectl edit -n kube-system configmap/aws-auth
You don't have to add/update another ConfigMap even if your deployment is in another namespace. It will take effect as well.
Sample lambda function call request and response:

Gitab provides the inbuilt integration of EKS and deployment with the help of Helm charts. If you plan to use other tools Using AWS lambda to update the image is the best bet!
I've added my github project.
Setup a lambda with below code and give RBAC access to this lambda in your EKS. Try invoking the lambda by passing the required information like namespace, deployment, image etc
Lambda for Kubernetes image update
The lambda must require EKS:describecluster policy.
The Lambda role must be provided atleast update image RBAC role in EKS cluster RBAC role setup

Since there's no built-in CI/CD for EKS at the moment, this is going to be a showcase of success/failure stories of a 3rd-party CI/CDs in EKS :) My take: https://github.com/fluxcd/flux
Pros:
Quick to set up initially (until you get into multiple teams/environments)
Tracks and deploys image releases out of box
Possibility to split what to auto-deploy in dev/prod using regex. E.g. all versions to dev, only minor to prod. Or separate tag prefixes for dev/prod.
All state is in git - a good practice to start with
Cons:
Getting complex for further pipeline expansion, e.g. blue-green, canary, auto-rollbacks, etc.
The dashboard is proprietary (weave works product)
Not for on-demand parametrized job runs like traditional CIs.
Setup:
Setup an automated image build (looks like you've already figured out)
Setup flux and helm-operator into the cluster, point them to your "gitops repo"
For each app, create a HelmRelease object that describes a regex of image tag to track
Done. A newly published image tag that falls into regex will be auto-deployed to the cluster and the new version is committed to a gitops repo.

Related

Container deployment with self-managed kubernetes in AWS

I am relatively new to AWS and kubernetes. I have created a self-managed kubernetes cluster running in AWS (not using EKS). I have successfully created a pipeline in AWS CodePipeline that builds my container and uploads it to ECR. Currently I am manually deploying the created image in the cluster by running the following commands:
kubectl delete deployment my-service
kubectl apply -f my-service_deployment.yaml
How can I automate this manual step in AWS CodePipeline? How can I run the above commands as part of the pipeline?
Regarding my deployment yaml files, where should I store these files? (currently I store them locally in the master node.)
I am missing some best practices for this process.

Your yaml manifests should'nt be on your master node (never), they should be stored in a Version Control System (just like github/gitlab/bitbucket etc.).
To automate the deployment of your docker image based on new artifact version in ECR, you can use a great tools named FluxCD, it is actually very simple to install (https://fluxcd.io/docs/get-started/) and you can easily configure it to automatically deploy your images in your cluster each time there is a new image on your ECR registry.
This way your codePipeline will build the code, do the tests, build the image, tag it and push it to ECR and FluxCD will deploy it to kubernetes. (it is also natively configurable to deploy on each X minutes (based on your configuration) on your cluster, so even if you bring a little change into your manifests, it will be automatically deployed !
bguess

you can also make use of argo cd its very easy to install and use compared to aws codepipeline.
argo cd was specifically designed for Kubernetes thus offers much better way to deploy to K8s

Best practice for sanity test a K8s cluster? (ideally all from command line)

I am new here, I tried to search for the topic before I post here, this may have been discussed before, please let me know before being to hash on me :)
In my project, after performing some changes on either the DevOps tool sets or infrastructures, we always do some manual sanity test, this normally includes:
Building a new image and update the helm chart
Push the image to Artifactory and perform a "helm update", and see it it runs.
I want to automate the whole thing, and try to get advice from the community, here's some requirement:
Validate Jenkins agent being able to talk to cluster ( I can do this with kubectl get all -n <some_namespace_jenkins_user_has_access_to)
Validate the cluster has access to Github (let's say I am using Argo CD to sync yamls)
Validate the cluster has access to Artifactory and able to pull image ( I don't want to build a new image with new tag and update helm chart, so that to force to cluster to pull new image)
All of the above can be done in command line (so that I can implement using Jenkins groovy)
Any suggestion is welcome.
Thanks guys

Your best bet is probably a combination of custom Jenkins scripts (i.e. running kubectl in Jenkins) and some in-cluster checks (e.g. using kuberhealthy).
So, when your Jenkins pipeline is triggered, it could do the following:
Check connectivity to the cluster
Build and push an image, etc.
Trigger in-cluster checks for testing if the cluster has access to GitHub and Artifactory, e.g. by launching a custom Job in the cluster, or creating a KuberhealthyCheck custom resource if you use kuberhealthy
During all this, the Jenkins pipeline writes the results of its test as metrics to a Pushgateway which is scraped by your Prometheus. The in-cluster checks also push their results as metrics to the Pushgateway, or expose them via kuberhealthy, if you decide to use it. In the end, you should have the results of all checks in the same Prometheus instance where you can react on them, e.g. creating Prometheus alerts or Grafana dashboards.

Application deployment over EKS using Jenkins

Can anyone tell me the deployment flow for deploying the application over Kubernetes or EKS cluster using Jenkins. How is the deployment files updated based on the change of the docker image. If we have multiple deployment files and we change any image for any one of them. Do all of them are redeployed?

Can anyone tell me the deployment flow for deploying the application over Kubernetes or EKS cluster using Jenkins.
Make sure that your Jenkins instance has an IAM Role and updated kubeconfig so that it can access the Kubernetes cluster. If you consider running the pipeline on the Kubernetes cluster, Jenkins X or Tekton Pipelines may be good alternatives that are better designed for Kubernetes.
How is the deployment files updated based on the change of the docker image.
It is a good practice to also keep the deployment manifest in version control, e.g. Git. This can be in the same repository or in a separate repository. For updating the image after a new image is built, consider using yq. An example yq command to update the image in a deployment manifest (one line):
yq write --inplace deployment.yaml 'spec.template.spec.containers(name==<myapp>).image' \
<my-registy-host>/<my-image-repository>/<my-image-name>:<my-tag-name>
If we have multiple deployment files and we change any image for any one of them. Do all of them are redeployed?
Nope, Kubernetes Yaml is declarative so it "understand" what is changed and only "drives" the necessary deployments to its "desired state" - since the other deployments already are in its "desired state".

How to pull from private project's image registry using GitLab managed Kubernetes clusters

GitLab offers to manage a Kubernetes cluster, which includes (e.g.) creating the namespace, adding some tokens, etc. In GitLab CI jobs, one can directly use the $KUBECONFIG variable for contacting the cluster and e.g. creating deployments using helm. This works like a charm, as long as the GitLab project is public and therefore Docker images hosted by the GitLab project's image registry are publicly accessible.
However, when working with private projects, Kubernetes of course needs an ImagePullSecret to authenticate the GitLab's image registry to retreive the image. As far as I can see, GitLab does not automatically provide an ImagePullSecret for repository access.
Therefore, my question is: What is the best way to access the image repository of private GitLab repositories in a Kubernetes deployment in a GitLab managed deployment environment?
In my opinion, these are the possibilities and why they are not eligible/optimal:
Permanent ImagePullSecret provided by GitLab: When doing a deployment on a GitLab managed Kubernetes cluster, GitLab provides a list of variables to the deployment script (e.g. Helm Chart or kubectl apply -f manifest.yml). As far as I can (not) see, there is a lot of stuff like ServiceAccounts and tokens etc., but no ImagePullSecret - and also no configuration option for enabling ImagePullSecret creation.
Using $CI_JOB_TOKEN: When working with GitLab CI/CD, GitLab provides a variable named $CI_JOB_TOKEN which can be used for uploading Docker images to the registry during job execution. This token expires after the job is done. It could be combined with helm install --wait, but when a rescheduling takes place to a new node which does not have the image yet, the token is expired and the node is not able to download the image anymore. Therefore, this only works right in the moment of deploying the app.
Creating an ImagePullSecret manually and add it to the Deployment or the default ServiceAccount: *This is a manual step, has to be repeated for each individual project and just sucks - we're trying to automate things/GitLab managed Kubernetes clusters is designed for avoiding any manual step.`
Something else but I don't know about it.
So, am I wrong in one of these points? Am I missing a eligible option in this listing?
Again: It's all about a seamless integration with the "Managed Cluster" features of GitLab. I know how to add tokens from GitLab as ImagePullSecrets in Kubernetes, but I want to know how to automate this with the Managed Cluster feature.

There is another way. You can bake the ImagePullSecret in your container runtime configuration. Docker, containerd or CRI-O (Whatever you are using)
Docker
As root run docker login <your-private-registry-url>. Then a file /root/.docker/config.json should be created/updated. Stick that in all your Kubernetes node and make sure your kubelet runs as root (which typically does). Some background info.
The content of the file should look something like this:
{
"auths": {
"my-private-registry": {
"auth": "xxxxxx"
}
},
"HttpHeaders": {
"User-Agent": "Docker-Client/18.09.2 (Linux)"
}
}
Containerd
Configure your containerd.toml file with something like this:
[plugins.cri.registry.auths]
[plugins.cri.registry.auths."https://gcr.io"]
username = ""
password = ""
auth = ""
identitytoken = ""
CRI-O
Specify the global_auth_file option in your crio.conf file.
✌️

Configure your account.
For example, for kubernetes pull image gitlab.com, use the address registry.gitlab.com:
kubectl create secret docker-registry regcred --docker-server=<your-registry-server> --docker-username=<your-name> --docker-password=<your-pword> --docker-email=<your-email>

terraforming with dependant providers

In my terraform infrastructure, I spin up several Kubernetes clusters based on parameters, then install some standard contents to those Kubernetes clusters using the kubernetes provider.
When I change the parameters and one of the clusters is no longer needed, terraform is unable to tear it down because the provider and resources are both in the module. I don't see an alternative, however, because I create the kubernetes cluster in that same module, and the kubernetes object are all per kubernetes cluster.
All solutions I can think of involve adding a bunch of boilerplate to my terraform config. Should I consider generating my terraform config from a script?
I made a git repo that shows exactly the problems I'm having:
https://github.com/bukzor/terraform-gke-k8s-demo

TL;DR
Two solutions:
Create two separate modules with Terraform
Use interpolations and depends_on between the code that creates your Kubernetes cluster and the kubernetes resources:
resource "kubernetes_service" "example" {
metadata {
name = "my-service"
}
depends_on = ["aws_vpc.kubernetes"]
}
resource "aws_vpc" "kubernetes" {
...
}
When destroying resources
You are encountering a dependency lifecycle issue
PS: I don't know the code you've used to create / provision your Kubernetes cluster but I guess it looks like this
Write code for the Kubernetes cluster (creates a VPC)
Apply it
Write code for provisionning Kubernetes (create an Service that creates an ELB)
Apply it
Try to destroy everything => Error
What is happenning is that by creating a LoadBalancer Service, Kubernetes will provision an ELB on AWS. But Terraform doesn't know that and there is no link between the ELB created and any other resources managed by Terraform.
So when terraform tries to destroy the resources in the code, it will try to destroy the VPC. But it can't because there is an ELB inside that VPC that terraform doesn't know about.
The first thing would be to make sure that Terraform "deprovision" the Kubernetes cluster and then destroy the cluster itself.
Two solutions here:
Use different modules so there is no dependency lifecycle. For example the first module could be k8s-infra and the other could be k8s-resources. The first one manages all the squeleton of Kubernetes and is apply first / destroy last. The second one manages what is inside the cluster and is apply last / destroy first.
Use the depends_on parameter to write the dependency lifecycle explicitly
When creating resources
You might also ran into a dependency issue when terraform apply cannot create resources even if nothing is applied yet. I'll give an other example with a postgres
Write code to create an RDS PostgreSQL server
Apply it with Terraform
Write code, in the same module, to provision that RDS instance with the postgres terraform provider
Apply it with Terraform
Destroy everything
Try to apply everything => ERROR
By debugging Terraform a bit I've learned that all the providers are initialized at the beggining of the plan / apply so if one has an invalid config (wrong API keys / unreachable endpoint) then Terraform will fail.
The solution here is to use the target parameter of a plan / apply command.
Terraform will only initialize providers that are related to the resources that are applied.
Apply the RDS code with the AWS provider: terraform apply -target=aws_db_instance
Apply everything terraform apply. Because the RDS instance is already reachable, the PostgreSQL provider can also initiate itself

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse