How can I update and ECS service adding an addition load balance to which a service talks to without downtime? - amazon-ecs

We use terraform to manage our AWS resources and have the code to change the service from one to two load balancers.
However,
terraform wants to destroy the service prior to recreating it. AWS cli docs indicate the reason - the API can only modify LBs during service creation and not on update.
create-service
update-service
It seems we need a blue/green deploy with the one LB and two LB services existing at the same time on the same cluster. I expect we'll need to create multiple task sets and the rest of the blue/green approach prior to this change (we'd planned for this anyway just not at this moment)
Does anyone have a good example for this scenario or know of any other approaches beyond full blue/green deployment?

Alas, it is not possible to change the number of LBs during an update. The service must be destroyed and recreated.
Ideally, one would be doing blue green deploys with multiple ECS clusters and a set of LBs. Then cluster A could have the old service and cluster B have the new service allowing traffic to move from A to B as we go from blue to green.
We aren't quite there yet but plan to be soon. So, for now, we will do the classic parking lot switch approach:
In this example the service that needs to go from 1 LB to 2 LBs is called target_service
clone target_service to become target_service2
deploy microservices that can talk to either target_service or target_service2
verify that target_service and target_service2 are both handling incoming data and requests
modify target_service infra-as-code to go from 1 to 2 LBs
deploy modified target_service (terraform deployment tool will destroy target_service leaving target_service2 to cover the gap, and then it will deploy target_service with 2 LBs
verify that target_service with 2 LBS works and handle requests
destroy and remove target_service2 as it is no longer needed
So, this is a blue-green like deploy albeit less elegant.

To update an Amazon ECS service with multiple load balancers, you need to ensure that you are using version 2 of the AWS CLI. Then, you can run the following command:
aws ecs update-service \
--cluster <cluster-name> \
--service <service-name> \
--load-balancers "[{\"containerName\": \"<container-name>\", \"containerPort\": <container-port>, \"targetGroupArn\": \"<target-group-arn1>\"}, {\"containerName\": \"<container-name>\", \"containerPort\": <container-port>, \"targetGroupArn\": \"<target-group-arn2>\"}]"
In the above command, replace with the name of your ECS cluster, with the name of your ECS service, with the name of the container in your task definition, with the port number the container listens to, and and with the ARN of your target groups.

Update 2022
ECS now supports updation of load balancers of a service. For now, this can be achieved using AWS CLI, AWS SDK, or AWS Cloudformation (not from AWS Console). From the documentation:
When you add, update, or remove a load balancer configuration, Amazon ECS starts new tasks with the updated Elastic Load Balancing configuration, and then stops the old tasks when the new tasks are running.
So, you don't need to create a new service. AWS update regarding this here. Please refer Update Service API doc on how to make the request.

Related

How to deploy container to ECS+Fargate+CodeDeploy without load balancer?

I have an app that comes in two pieces:
The (Ruby/Rails) main app which is also the web frontend
A Sidekiq event handler
The first one is not a problem, done that. CodeDeploy will deploy the new version by creating the new service(s) on the "blue load balancer target group" and start moving the traffic to them in a blue/green setup.
However, Sidekiq doesn't have a port, so no health check. This is the problem I'm having now, the NLB doesn't validate the health of the container, so it keeps restarting and the deploy eventually times out because it can't get a clean service.
I'm not sure if this is a ECS limitation or a CodeDeploy limitation (or a "me limitation" :) ), but it seems I have to have a load balancer for that to work.
But something about that just rings wrong! I'm sure AWS must have thought about "worker containers", right? How do you setup an ECS cluster with two services, each with one task - the two parts above. And how do you then use CodeDeploy to deploy to that service/task? Or do you deploy Sidekiq (in this case) in some other way?
Having two services, I can scale them individually.
At the moment, I have a CodePipeline, triggering a CodeBuild to build the image, push that into ECR then trigger CodeDeploy to do the deployment. It worked initially when I had one service, with the two tasks, but then both would scale simultaneously which isn't what I want in the end.

How to add external GCP loadbalancer to kubespray cluster?

I deployed a kubernetes cluster on Google Cloud using VMs and Kubespray.
Right now, I am looking to expose a simple node app to external IP using loadbalancer but showing my external IP from gcloud to service does not work. It stays on pending state when I query kubectl get services.
According to this, kubespray does not have any loadbalancer mechanicsm included/integrated by default. How should I progress?
Let me start of by summarizing the problem we are trying to solve here.
The problem is that you have self-hosted kubernetes cluster and you want to be able to create a service of type=LoadBalancer and you want k8s to create a LB for you with externlIP and in fully automated way, just like it would if you used a GKE (kubernetes as a service solution).
Additionally I have to mention that I don't know much of a kubespray, so I will only describe all the steps that need to bo done to make it work, and leave the rest to you. So if you want to make changes in kubespray code, it's on you.
All the tests I did with kubeadm cluster but it should not be very difficult to apply it to kubespray.
I will start of by summarizing all that has to be done into 4 steps:
tagging the instances
enabling cloud-provider functionality
IAM and service accounts
additional info
Tagging the instances
All worker node instances on GCP have to be labeled with unique tag that is the name of an instance; these tags are later used to create a firewall rules and target lists for LB. So lets say that you have an instance called worker-0; you need to tag that instance with a tag worker-0
Otherwise it will result in an error (that can be found in controller-manager logs):
Error syncing load balancer: failed to ensure load balancer: no node tags supplied and also failed to parse the given lists of hosts for tags. Abort creating firewall rule
Enabling cloud-provider functionality
K8s has to be informed that it is running in cloud and what cloud provider that is so that it knows how to talk with the api.
controller manager logs informing you that it wont create an LB.
WARNING: no cloud provider provided, services of type LoadBalancer will fail
Controller Manager is responsible for creation of a LoadBalancer. It can be passed a flag --cloud-provider. You can manually add this flag to controller manager pod manifest file; or like in your case since you are running kubespray, you can add this flag somewhere in kubespray code (maybe its already automated and just requires you to set some env or sth, but you need to find it out yourself).
Here is how this file looks like with the flag:
apiVersion: v1
kind: Pod
metadata:
labels:
component: kube-controller-manager
tier: control-plane
name: kube-controller-manager
namespace: kube-system
spec:
containers:
- command:
- kube-controller-manager
...
- --cloud-provider=gce # <----- HERE
As you can see the value in our case is gce, which stangs for Google Compute Engine. It informs k8s that its running on GCE/GCP.
IAM and service accounts
Now that you have your provider enabled, and tags covered, I will talk about IAM and permissions.
For k8s to be able to create a LB in GCE, it needs to be allowed to do so. Every GCE instance has a deafult service account assigned. Controller Manager uses instance service account, stored within instance metadata to access GCP API.
For this to happen you need to set Access Scopes for GCE instance (master node; the one where controller manager is running) so it can use Cloud Engine API.
Access scopes -> Set access for each API -> compute engine=Read Write
To do this the instance has to be stopped, so now stop the instance. It's better to set these scopes during instance creation so that you don't need to make any unnecessary steps.
You also need to go to IAM & Admin page in GCP Console and add permissions so that master instance's service account has Kubernetes Engine Service Agent role assigned. This is a predefined role that has much more permissions than you probably need but I have found that everything works with this role so I decided to use is for demonstration purposes, but you probably want to use least privilege rule.
additional info
There is one more thing I need to mention. It does not impact you but while testing I have found out an interesting thing.
Firstly I created only one node cluster (single master node). Even though this is allowed from k8s point of view, controller manager would not allow me to create a LB and point it to a master node where my application was running. This draws a conclusion that one cannot use LB with only master node and has to create at least one worker node.
PS
I had to figure it out the hard way; by looking at logs, changing things and looking at logs again to see if the issue got solved. I didn't find a single article/documentation page where it is documented in one place. If you manage to solve it for yourself, write the answer for others. Thank you.

Does AWS CDK provide constructs to deploy ECS applications?

We generally use BlueGreen & Rolling deployment strategy,
for docker containers in ECS container instances, to get deployed & updated.
Ansible ECS modules allow implement such deployment strategies with below modules:
https://docs.ansible.com/ansible/latest/modules/ecs_taskdefinition_module.html
https://docs.ansible.com/ansible/latest/modules/ecs_task_module.html
https://docs.ansible.com/ansible/latest/modules/ecs_service_module.html
Does AWS CDK provide such constructs for implementing deployment strategies?
CDK supports higher level constructs for ECS called "ECS patterns". One of them is ApplicationLoadBalancedFargateService which allows you to define an ECS Fargate service behind an Application Load Balancer. Rolling update is supported out of the box in this case. You simply run cdk deploy with a newer Docker image and ECS will take care of the deployment. It will:
Start a new task with the new Docker image.
Wait for several successful health checks of the new deployment.
Start sending new traffic to the new task, while letting the existing connections gracefully finish on the old task.
Once all old connections are done, ECS will automatically stop the old task.
If your new task does not start or is not healthy, ECS will keep running the original task.
Regarding Blue-Green deployment I think it's yet to be supported in CloudFormation. Once that's done, it can be implemented in CDK. If you can live without BlueGreen as IaC, you can define your CodeDeploy manually.
Check this NPM plugin which helps with blue-green deployment using CDK.
https://www.npmjs.com/package/#cloudcomponents/cdk-blue-green-container-deployment
Blue green deployments are supported in cloud formation now .
https://aws.amazon.com/about-aws/whats-new/2020/05/aws-cloudformation-now-supports-blue-green-deployments-for-amazon-ecs/
Don’t think CDK implementation is done yet .

How can i set up workers in kubernetes (infrastructure questions)

I'm using kubernetes and i would like to set up workers , one of my docker host an API using flask, i have an algorithm in another docker (same pod , i don't know if i should leave it in the same) and other scripts that are also in separated dockers.
i want to link all of these, when i receive a request on the API, call the other dockers depending on the request and get the return.
I don't know how to do that with multiple dockers and so kubernetes.
I'm using RQ library for python to parallelize until now but it was on Heroku without kubernetes (i'm migrating to azure at the moment) and i don't know how it manage it behind.
Thank you.
follow the below reference and setup kubernetes cluster using kubeadm.
https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/
using 'kubeadm join' command you should be able to add worker nodes to the master.
above given link has steps to join the worker to master
If you are using Azure, you can try exploring AKS. It works out of the box. You just need to configure kubectl and you will be good to go.
Regarding deploying multiple microservices(API), you can deploy each microservice as a separate k8s deployment using kubectl and expose them using a service. This way they can communicate with each other using exposed endpoints(API) or a message queue .
Here is a quick guide you can take help from : https://dzone.com/articles/quick-guide-to-microservices-with-kubernetes-sprin
Typically you should use only one container per pod. Multiple containers per pod are possible but are typically used for sidecars, not for additional APIs.
You expose your services using kubernetes services, no need to run everything on a different port if you don't want to.
A minimal setup for typicall webapi calls would look something like this (if you expose your API service as public LoadBalancer you don't necessarily need Ingress)
Client -> (Ingress) -> API service -> API deployment pod(s) -> internal services -> deployment pods.
You can access your internal services from within your cluster using http(s)://servicename[:custom-port]
On the other hand, if you simply use flask to forward API calls to other services, you might want to replace it with an Ingress Controller that does all the routing for you.

Remove ECS task for forensic analysis

I use Amazon ECS for my services. ECS autoregisters the containers with the ELB. That is all fine.
However if one of my application gets into a failure mode I want to take it out of the ELB but keep it running so I can do forensic on it.
If I remove it from the ELB manually then ECS will just add it back in.
The AutoScaling group has a "standby" state. I guess it is something similar I am looking for in ECS.
Is that possible somehow?