I have an ECS Fargate service which uses CloudWatch alarms to scale-in/scale-out using service auto-scaling. The task containers have long processing times (upto 40 minutes) and I don't want a running container to get killed when a scale-in happens. Is there way to do that for an ECS task/service?
PS: I have looked at the stopTimeout property in a task-definition but its max value is only 120 seconds. I have also looked at scale-in protection for EC2 instances but haven't found any such solution for an ECS Fargate task.
Support for ECS task scale-in protection was released on 2022-11-10: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-scale-in-protection.html
In summary, you can use the new ECS container agent endpoint from inside a task to mark it as protected:
PUT $ECS_AGENT_URI/task-protection/v1/state -d
'{"ProtectionEnabled":true}'
Alternatively, you can use the UpdateTaskProtection API to achieve the same result from outside the task: https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_UpdateTaskProtection.html
Related
I have a ECS fargate cluster up and running and it has 1 service and 1 task definition attached to it.
The task definition already has 2 container images described.This cluster is up and running.
Can I create a new service and for another application and configure it with this Existing ECS cluster.
If yes, will both the service run simultaneously.
From the AWS Documentation in regards Amazon ECS Clusters
An Amazon ECS cluster is a logical grouping of tasks or services. Your
tasks and services are run on infrastructure that is registered to a
cluster.
So I believe, you should be able to run multiple services in a cluster that is attached to its related task definition in the ECS.
Source Documentation - https://docs.aws.amazon.com/AmazonECS/latest/developerguide/clusters.html
We generally use BlueGreen & Rolling deployment strategy,
for docker containers in ECS container instances, to get deployed & updated.
Ansible ECS modules allow implement such deployment strategies with below modules:
https://docs.ansible.com/ansible/latest/modules/ecs_taskdefinition_module.html
https://docs.ansible.com/ansible/latest/modules/ecs_task_module.html
https://docs.ansible.com/ansible/latest/modules/ecs_service_module.html
Does AWS CDK provide such constructs for implementing deployment strategies?
CDK supports higher level constructs for ECS called "ECS patterns". One of them is ApplicationLoadBalancedFargateService which allows you to define an ECS Fargate service behind an Application Load Balancer. Rolling update is supported out of the box in this case. You simply run cdk deploy with a newer Docker image and ECS will take care of the deployment. It will:
Start a new task with the new Docker image.
Wait for several successful health checks of the new deployment.
Start sending new traffic to the new task, while letting the existing connections gracefully finish on the old task.
Once all old connections are done, ECS will automatically stop the old task.
If your new task does not start or is not healthy, ECS will keep running the original task.
Regarding Blue-Green deployment I think it's yet to be supported in CloudFormation. Once that's done, it can be implemented in CDK. If you can live without BlueGreen as IaC, you can define your CodeDeploy manually.
Check this NPM plugin which helps with blue-green deployment using CDK.
https://www.npmjs.com/package/#cloudcomponents/cdk-blue-green-container-deployment
Blue green deployments are supported in cloud formation now .
https://aws.amazon.com/about-aws/whats-new/2020/05/aws-cloudformation-now-supports-blue-green-deployments-for-amazon-ecs/
Don’t think CDK implementation is done yet .
We have an application to create/start/stop containers inside AWS ECS. we are not making use of ecs services because we don't want container to be started if it is stopped by an application.
So how to automate scale-in/scale-out of the cluster instances in ecs without using ecs services?
Below is the documentation which will tell you step by step how to scale your container instances.
Scaling Container Instances
So how this works is :
Say you have one Container Instance and 2 services running on it.
You are required to increase the ECS Service but it will not scale as it doesn't have resources available on one Container Instance.
Following up the documentation, you can set up CloudWatch Alarms on let's say MemoryReservation metric for your cluster.
When the memory reservation of your cluster rises above 75% (meaning that only 25% of the memory in your cluster is available to for new tasks to reserve), the alarm triggers the Auto Scaling group to add another instance and provide more resources for your tasks and services.
Depending on the Amazon EC2 instance types that you use in your
clusters, and quantity of container instances that you have in a
cluster, your tasks have a limited amount of resources that they can
use while running. Amazon ECS monitors the resources available in the
cluster to work with the schedulers to place tasks. If your cluster
runs low on any of these resources, such as memory, you are eventually
unable to launch more tasks until you add more container instances,
reduce the number of desired tasks in a service, or stop some of the
running tasks in your cluster to free up the constrained resource.
I have built an container and run successfully by aws ecs fargate. The container will download file form s3, process and upload file to s3. But because the ecs service restart policy, the service will restart a container which I don't want.
In Kubernetes, I use restartPolicy: OnFailure, but I have read the doc for ecs, all the task definition and service definition.
The closest parameter I find is "dockerLabels", set the set "--restart": 'no', but it didn't work.
How can I do to not let the container restart in ecs?
What you are describing aligns with the use case for running ECS tasks manually or according to a schedule. Unlike running tasks with the service scheduler, the tasks won't be restarted and are ideal for one-time or periodic batch execution workloads. You can run a task manually from the AWS Console or using the RunTask API. Tasks can be used to mimic Jobs and CronJobs in Kubernetes.
Services will always maintain the desired number of tasks and this behavior can't be modified:
If any of your tasks should fail or stop for any reason, the Amazon ECS service scheduler launches another instance of your task definition to replace it and maintain the desired count of tasks in the service depending on the scheduling strategy used.
I use Amazon ECS for my services. ECS autoregisters the containers with the ELB. That is all fine.
However if one of my application gets into a failure mode I want to take it out of the ELB but keep it running so I can do forensic on it.
If I remove it from the ELB manually then ECS will just add it back in.
The AutoScaling group has a "standby" state. I guess it is something similar I am looking for in ECS.
Is that possible somehow?