Use Application Autoscaling Group with ELB Healthchecks - aws-cloudformation

Has anybody succeeded in using an Application Autoscaling group with an ELB Health check. It replaces the instances over and over. Is there a way to prevent that?
My template looks like that:
Resources:
ECSAutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
AvailabilityZones:
- Fn::Select:
- '0'
- Fn::GetAZs:
Ref: AWS::Region
- Fn::Select:
- '1'
- Fn::GetAZs:
Ref: AWS::Region
- Fn::Select:
- '2'
- Fn::GetAZs:
Ref: AWS::Region
VPCZoneIdentifier:
- Fn::ImportValue: !Sub ${EnvironmentName}-PrivateEC2Subnet1
- Fn::ImportValue: !Sub ${EnvironmentName}-PrivateEC2Subnet2
- Fn::ImportValue: !Sub ${EnvironmentName}-PrivateEC2Subnet3
HealthCheckGracePeriod: !Ref ASGHealthCheckGracePeriod
HealthCheckType: !Ref ASGHealthCheckType
LaunchTemplate:
LaunchTemplateId: !Ref ECSLaunchTemplate
Version: 1
MetricsCollection:
- Granularity: 1Minute
ServiceLinkedRoleARN:
!Sub arn:aws:iam::${AWS::AccountId}:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling
DesiredCapacity: !Ref ASGDesiredCapacity
MinSize: !Ref ASGMinSize
MaxSize: !Ref ASGMaxSize
TargetGroupARNs:
- Fn::ImportValue: !Sub ${EnvironmentName}-WebTGARN
Fn::ImportValue: !Sub ${EnvironmentName}-DataTGARN
Fn::ImportValue: !Sub ${EnvironmentName}-GeneratorTGARN
TerminationPolicies:
- OldestInstance
the Launchtemplate looks like this:
ECSLaunchTemplate:
Type: AWS::EC2::LaunchTemplate
Properties:
LaunchTemplateName: ECSLaunchtemplate
LaunchTemplateData:
ImageId: !FindInMap [AWSRegionToAMI, !Ref "AWS::Region", AMI]
InstanceType: !Ref InstanceType
SecurityGroupIds:
- Fn::ImportValue: !Sub ${EnvironmentName}-ECSInstancesSecurityGroupID
IamInstanceProfile:
Arn:
Fn::ImportValue:
!Sub ${EnvironmentName}-ecsInstanceProfileARN
Monitoring:
Enabled: true
CreditSpecification:
CpuCredits: standard
TagSpecifications:
- ResourceType: instance
Tags:
- Key: "keyname1"
Value: "value1"
KeyName:
Fn::ImportValue:
!Sub ${EnvironmentName}-ECSKeyPairName
UserData:
"Fn::Base64": !Sub
- |
#!/bin/bash
yum update -y
yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm
yum update -y aws-cfn-bootstrap hibagent
/opt/aws/bin/cfn-init -v --region ${AWS::Region} --stack ${AWS::StackName} --resource ECSLaunchTemplate --region ${AWS::Region}
/opt/aws/bin/cfn-signal -e $? --region ${AWS::Region} --stack ${AWS::StackName} --resource ECSAutoScalingGroup
/usr/bin/enable-ec2-spot-hibernation
echo ECS_CLUSTER=${ECSCluster} >> /etc/ecs/ecs.config
PATH=$PATH:/usr/local/bin
- ECSCluster:
Fn::ImportValue:
!Sub ${EnvironmentName}-ECSClusterName
the Load balancer config looks like this:
ApplicationLoadBalancerInternet:
Type: AWS::ElasticLoadBalancingV2::LoadBalancer
Properties:
Name: !Sub ${EnvironmentName}-${Project}-ALB-Internet
IpAddressType: !Ref ELBIpAddressType
Type: !Ref ELBType
Scheme: internet-facing
Subnets:
- Fn::ImportValue:
!Sub ${EnvironmentName}-PublicSubnet1
- Fn::ImportValue:
!Sub ${EnvironmentName}-PublicSubnet2
- Fn::ImportValue:
!Sub ${EnvironmentName}-PublicSubnet3
SecurityGroups:
- Fn::ImportValue:
!Sub ${EnvironmentName}-ALBInternetSecurityGroupID
As said, its working fine with EC2 Healthchecks but when I switch to ELB Healthchecks the instances are being drained and the ASG spins up a new instance.
Merci A

I would troubleshoot it like this:
Delete this stack.
Edit your template and change the ASG health-check type to ELB (for now).
Create new stack either from CLI or console. I recommend CLI since you might have to recreate it and it's far simpler/quicker than console. The most important step is to enable "Disable-Rollback" feature when the stack fails, otherwise, you wont be able to find out the reason of failure
I believe you will also be creating some IAM resources as a part of this template, so an example CLI command would be this for your quick reference:
aws cloudformation create-stack --stack-name Name-of-your-stack --template-body file://template.json --tags Key=Name,Value=Your_Tag_Value --profile default --region region --capabilities CAPABILITY_NAMED_IAM --disable-rollback yes
For more information on the requirement of CAPABILITY_NAMED_IAM, see this SO answer.
Now, when you create the stack, it's still going to fail, but now we can troubleshoot it. The reason we kept the healthcheck type to ELB in step 2 is that we actually want the ASG to replace the instances on failed healthchecks and we can find out the reason in the ASG's "Activity History tab" from the console.
Chances are high, that you will see a message far more meaningful than, that was returned by CloudFormation.
Now that you have that error message, change the healthcheck type of ASG from the console to EC2, because we do not want the ASG to start of loop of "launch and terminate" for EC2 instances.
Now, login to your EC2 instance and look for the access logs, for the hits from your ELB healthcheck. In httpd, a successful healthcheck gets an HTTP 408.
Also please note that if the ELB healtcheck type is TCP:80 then, there isnt any port conflict on your server and if you have selected HTTP:80, then you have specified a path/file as well as your ping target.
Since your script has some user-data as well, please also review /var/log/cfn-init.log and other entries for any error message. A simple option would be, grep error /var/log/*
Now, at this point, you just have to make sure you get the ELB healthcheck successful and the instance "in-service" behind the ELB and the most important step is to document all the troubleshooting steps because you never know, which step out of many you tried actually fixed this healthcheck.
Once you are able to find the cause, just put it in the template and you should be good to go. I have seen many templates going wrong at Step 8.
Also, do not miss to change the ASG healthecheck to ELB, once again.

Related

How to correctly add the EnviromentFIle property to an ECS Container Definition in CloudFormation

I am trying to define an ECs cluster deployment using CLoudFormation. So far I have been successful with defining and executing the template.
I decided to externalize the environment variables for the container by using the EnvironmentFile property in the AWS::ECS::TaskDefinition resource.
I think I'm using the correct syntax according to the documentation:
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ecs-taskdefinition-containerdefinitions.html
However running the template in CF generates an error, telling me that the keys I'm using for the EnviromentFile definition are not permitted.
The most strange thing is that the stack update since to complete successfully and I can see the property when I look at the task definition in the console. Is this an error I should ignore or Is there a more correct way to define these property
CloudFormation snippet:
TaskDefinition:
Type: AWS::ECS::TaskDefinition
Properties:
Family: !Ref 'ServiceName'
Cpu: !Ref 'ContainerCpu'
Memory: !Ref 'ContainerMemory'
NetworkMode: awsvpc
RequiresCompatibilities:
- FARGATE
ExecutionRoleArn: !Ref 'ECSTaskExecutionRole'
TaskRoleArn:
Fn::If:
- 'HasCustomRole'
- !Ref 'Role'
- !Ref "AWS::NoValue"
ContainerDefinitions:
- Name: !Ref 'ServiceName'
Cpu: !Ref 'ContainerCpu'
Memory: !Ref 'ContainerMemory'
Image: !Ref 'ImageUrl'
EnvironmentFiles:
- value: !Ref EnvFile
type: s3
PortMappings:
- ContainerPort: !Ref 'ContainerPort'
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group: !Ref ApplicationLogGroup
awslogs-region: !Ref AWS::Region
awslogs-stream-prefix: !Sub ${AWS::StackName}-ecs-service
Reported error:
Resource template validation failed for resource TaskDefinition as the template has invalid properties.
Please refer to the resource documentation to fix the template.
Properties validation failed for resource TaskDefinition with message:
#/ContainerDefinitions/0/EnvironmentFiles/0: extraneous key [type] is not permitted
#/ContainerDefinitions/0/EnvironmentFiles/0: extraneous key [value] is not permitted
Ok, I'm answering this to close it. After trying several things I realize that the value and type property were in lover case and CloudFomation enforces that the properties need to start with Uppercase. making this change removed the error
EnvironmentFiles:
- Value: !Ref EnvFile
Type: s3

Introduce a condition on a custom resource in CloudFormation

Another Engineer introduced a deploy date parameter into our AMIFInder Custom Resource in the prod stack which means we can no longer update the dev stack without attempting to recreate the EC2 instance.
Is it possible to introduce a condition purely based on the DeployDate parameter so I can still use one template for both stacks?
FindAmiResource:
Type: 'Custom::FindAmiFunction'
Properties:
ServiceToken:
Fn::ImportValue:
!Sub
- cfn:find-ami:${AWSAccount}:arn
- {AWSAccount: !FindInMap [AccountIDMap, Accounts, !Ref "AWS::AccountId"]}
AmiName: 'Corp_w2016_Std-*'
AmiOwner: '9999999999999'
DeployDate: !Ref AMIDeployDate
Assuming you have some information to key off (like a known AccountId or a parameter in the stack) you can create a condition that defines the stack as dev. Then you can use the 'Fn::If' function, like this:
FindAmiResource:
Type: 'Custom::FindAmiFunction'
Properties:
ServiceToken:
Fn::ImportValue:
!Sub
- cfn:find-ami:${AWSAccount}:arn
- {AWSAccount: !FindInMap [AccountIDMap, Accounts, !Ref "AWS::AccountId"]}
AmiName: 'Corp_w2016_Std-*'
AmiOwner: '9999999999999'
DeployDate:
Fn::If:
- DevCondition
- !Ref AWS::NoValue
- !Ref AMIDeployDate

AWS CloudFormation: Nested Sub with Dynamic References using {{resolve}} causes error and doesn't execute resolve to get value from Parameter Store

I am trying to use AWS CloudFormation Template to create an EC2 Instance with some userdata generated using dynamic references and cross-stack reference in the template . There is a parameter stored in AWS Systems Manager Parameter Store with Name:/MyCustomParameter and Value:Test1.
The idea is to pass a parameter to the template stack (Stack A) which refers to another cloudformation stack (StackB). Stack B exports a variable with reference "StackB::ParameterStoreName". Stack A uses Fn::ImportValue: 'StackB::ParameterStoreName' to get it's value so that it can be used with dynamic references method to get it's value from AWS SSM Parameter Store using {{resolve:ssm:/MyCustomParameter:1}} and pass it's value to the UserData field in the template. I am facing difficulties while trying to use nested Fn::Sub: function with this use-case.
I tried removing the | pipe and using double quotes with escaped new line character but that doesn't work.
I also tried using a different type of resource and it's properties where is worked. Below is an example of the code that worked.
Resources:
TestBucket:
Type: 'AWS::S3::Bucket'
Properties:
BucketName:
Fn::Sub:
- '${SSMParameterValue}-12345'
- SSMParameterValue:
Fn::Sub:
- '{{resolve:ssm:${SSMParameterName}:1}}'
- SSMParameterName:
Fn::ImportValue:
!Sub '${CustomStack}::ParameterStoreName'
Below is an extract of the current code I have:
Parameters:
CustomStack:
Type: "String"
Default: "StackB"
Resources:
MyCustomInstance:
Type: 'AWS::EC2::Instance'
Properties:
UserData:
Fn::Base64:
Fn::Sub:
- |
#!/bin/bash -e
#
# Bootstrap and join the cluster
/etc/eks/bootstrap.sh --b64-cluster-ca '${SSMParameterValue}' --apiserver-endpoint '${Endpoint}' '${ClusterName}'"
- SSMParameterValue:
Fn::Sub:
- '{{resolve:ssm:/${SSMParameterName}:1}}'
- SSMParameterName:
Fn::ImportValue:
!Sub '${CustomStack}::ParameterStoreName'
Endpoint:
Fn::ImportValue:
!Sub '${CustomStack}::Endpoint'
ClusterName:
Fn::ImportValue:
!Sub '${CustomStack}::ClusterStackName'
Current Output:
#!/bin/bash -e
#
# Bootstrap and join the cluster
/etc/eks/bootstrap.sh --b64-cluster-ca `{{resolve:ssm:MyCustomParameter:1}}` --apiserver-endpoint 'https://04F1597P0HJ11FQ54K0YFM9P19.gr7.us-east-1.eks.amazonaws.com' 'eks-cluster-1'
Expected Output:
#!/bin/bash -e
#
# Bootstrap and join the cluster
/etc/eks/bootstrap.sh --b64-cluster-ca `Test1` --apiserver-endpoint 'https://04F1597P0HJ11FQ54K0YFM9P19.gr7.us-east-1.eks.amazonaws.com' 'eks-cluster-1'
I think it is because the resolve is in the base64, maybe...? When it processes the line it just sees a block of base64 and not the {{resolve...}} code. The "resolves" get processed at a later pass than the !Functions, because they can't be resolved until the code is running.
To work around it, I added a temporary SSM parameter :
eksCAtmp:
Type: "AWS::SSM::Parameter"
Properties:
Type: String
Value:
Fn::Join:
- ''
- - '{{resolve:ssm:'
- Fn::ImportValue:
!Sub "${ClusterName}-EksCA"
- ':1}}'
That imports the original SSM parameter and gets rid of the requirement to "import" and resolve it again. So now you can use !GetAtt eksCAtemp.Value
eg:
UserData: !Base64
"Fn::Sub":
- |
#!/bin/bash
set -o xtrace
/etc/eks/bootstrap.sh ${ClusterName} --b64-cluster-ca ${CA} --apiserver-endpoint ${endpoint} --kubelet-extra-args '--read-only-port=10255'
/opt/aws/bin/cfn-signal --exit-code $? \
--stack ${AWS::StackName} \
--resource NodeGroup \
--region ${AWS::Region}
- endpoint:
Fn::ImportValue:
!Sub "${ClusterName}-EksEndpoint"
CA: !GetAtt eksCAtmp.Value
(Of course if they allowed cross stack exports to be more than 1024 characters, we wouldn't need this for firing up EKS on a private network.)
You can write like below:
UserData:
Fn::Base64:
Fn::Sub:
- |
#!/bin/bash -e
#
# Bootstrap and join the cluster
export SSMParameterValue=$(aws --region ${AWS::Region} ssm get-parameters --names ${SSMParameterName} --query 'Parameters[0].Value' --output text)
/etc/eks/bootstrap.sh --b64-cluster-ca \`$SSMParameterValue\` --apiserver-endpoint '${Endpoint}' '${ClusterName}'"
- SSMParameterName:
Fn::ImportValue:
!Sub '${CustomStack}::ParameterStoreName'
Endpoint:
Fn::ImportValue:
!Sub '${CustomStack}::Endpoint'
Don't forget your EC2 role need ssm:GetParameters permission.

504 Gateway Timeout using Application Load Balancer in ECS

Deploying a Laravel web application on ECS, in order to enable autoscaling I am using an Application Load Balancer. The application worked (and scaled) perfectly until I introduced a heavy weight page, where I started to get 504 Gateway Timeout errors after a minute or so.
I am pretty sure the single web server has a higher timeout (this never happens when the application is tested in local) so the problem must be related to something related to AWS environment (ECS / ALB).
Below you can find a snipped of the ALB setting
AdminLoadBalancer:
Type: AWS::ElasticLoadBalancingV2::LoadBalancer
Properties:
SecurityGroups:
- !Ref 'AlbSecurityGroup'
Subnets:
- !Ref 'PublicSubnetAz1'
- !Ref 'PublicSubnetAz2'
Scheme: internet-facing
Name: !Join ['-', [!Ref 'AWS::StackName', 'lb']]
After some attempts, I solved the issue setting the idle timeout attribute of the load balancer, as explained here in theory, because nothing was wrong with the single ECS Tasks. In Cloudformation, it was enough to add the attribute setting of the parameter, and double the default value.
AdminLoadBalancer:
Type: AWS::ElasticLoadBalancingV2::LoadBalancer
Properties:
LoadBalancerAttributes:
- Key: 'idle_timeout.timeout_seconds'
Value: 120
SecurityGroups:
- !Ref 'AlbSecurityGroup'
Subnets:
- !Ref 'PublicSubnetAz1'
- !Ref 'PublicSubnetAz2'
Scheme: internet-facing
Name: !Join ['-', [!Ref 'AWS::StackName', 'lb']]

Cannot Delete Amazon ECS Cluster using CloudFormation

I am using the following CloudFormation template to create ECS Cluster.
AWSTemplateFormatVersion: '2010-09-09'
Description: 'AWS Cloudformation Template to create the Infrastructure'
Resources:
ECSCluster:
Type: AWS::ECS::Cluster
Properties:
ClusterName: 'Blog-iac-test-1'
EC2InstanceProfile:
Type: AWS::IAM::InstanceProfile
Properties:
Path: /
Roles: [!Ref 'EC2Role']
ECSAutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
VPCZoneIdentifier:
- subnet-****
LaunchConfigurationName: !Ref 'ECSAutoscalingLC'
MinSize: '1'
MaxSize: '2'
DesiredCapacity: '1'
ECSAutoscalingLC:
Type: AWS::AutoScaling::LaunchConfiguration
Properties:
AssociatePublicIpAddress: true
ImageId: 'ami-b743bed1'
SecurityGroups:
- sg-****
InstanceType: 't2.micro'
IamInstanceProfile: !Ref 'EC2InstanceProfile'
KeyName: 'test'
UserData:
Fn::Base64: !Sub |
#!/bin/bash -xe
echo ECS_CLUSTER=Blog-iac-test-1 >> /etc/ecs/ecs.config
EC2Role:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service: [ec2.amazonaws.com]
Action: ['sts:AssumeRole']
Path: /
ECSServicePolicy:
Type: "AWS::IAM::Policy"
Properties:
PolicyName: "root"
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action: ['ecs:*', 'logs:*', 'ecr:*', 's3:*']
Resource: '*'
Roles: [!Ref 'EC2Role']
The stack is created successfully, but while destroying, I am getting the following error:
The Cluster cannot be deleted while Container Instances are active or draining.
I was able to delete the stack earlier, this issue started to occur recently.
What could be a workaround to avoid this issue ? Should I need to add some dependencies ?
As mentioned in this AWS Documentation Link, have you tried deregistering the instances as well?:
Deregister Container Instances:
Before you can delete a cluster, you
must deregister the container instances inside that cluster. For each
container instance inside your cluster, follow the procedures in
Deregister a Container Instance to deregister it.
Alternatively, you can use the following AWS CLI command to deregister
your container instances. Be sure to substitute the Region, cluster
name, and container instance ID for each container instance that you
are deregistering.
aws ecs deregister-container-instance --cluster default
--container-instance container_instance_id --region us-west-2 --force