fargate failing on docker pull in private subnet - aws-cloudformation

I am having trouble deploying a fargate cluster, and it is failing on the docker pull image with error "CannotPullContainerError". I am creating the stack with cloudformation, which is not optional, and it creates the full stack, but fails when trying to start the task based on the above error.
I have attached the cloudformation stack file which might highlight the problem, and I have doubled checked that the subnet has a route to nat(below). I also ssh'ed into an instance in the same subnet which was able to route externally. I am wondering if i have not correctly placed the pieces required i.e the service + loadbalancer are in the private subnet, or should i not be placing the internal lb in the same subnet???
This subnet is the one that currently has the placement but all 3 in the file have the same nat settings.
subnet routable (subnet-34b92250)
* 0.0.0.0/0 -> nat-05a00385366da527a
cheers in advance.
yaml cloudformaition script:
AWSTemplateFormatVersion: 2010-09-09
Description: Cloudformation stack for the new GRPC endpoints within existing vpc/subnets and using fargate
Parameters:
StackName:
Type: String
Default: cf-core-ci-grpc
Description: The name of the parent Fargate networking stack that you created. Necessary
vpcId:
Type: String
Default: vpc-0d499a68
Description: The name of the parent Fargate networking stack that you created. Necessary
Resources:
CoreGrcpInstanceSecurityGroupOpenWeb:
Type: 'AWS::EC2::SecurityGroup'
Properties:
GroupName: sgg-core-ci-grpc-ingress
GroupDescription: Allow http to client host
VpcId: !Ref vpcId
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: '80'
ToPort: '80'
CidrIp: 0.0.0.0/0
SecurityGroupEgress:
- IpProtocol: tcp
FromPort: '80'
ToPort: '80'
CidrIp: 0.0.0.0/0
LoadBalancer:
Type: 'AWS::ElasticLoadBalancingV2::LoadBalancer'
DependsOn:
- CoreGrcpInstanceSecurityGroupOpenWeb
Properties:
Name: lb-core-ci-int-grpc
Scheme: internal
Subnets:
# # pub
# - subnet-f13995a8
# - subnet-f13995a8
# - subnet-f13995a8
# pri
- subnet-34b92250
- subnet-82d85af4
- subnet-ca379b93
LoadBalancerAttributes:
- Key: idle_timeout.timeout_seconds
Value: '50'
SecurityGroups:
- !Ref CoreGrcpInstanceSecurityGroupOpenWeb
TargetGroup:
Type: 'AWS::ElasticLoadBalancingV2::TargetGroup'
DependsOn:
- LoadBalancer
Properties:
Name: tg-core-ci-grpc
Port: 3000
TargetType: ip
Protocol: HTTP
HealthCheckIntervalSeconds: 30
HealthCheckProtocol: HTTP
HealthCheckTimeoutSeconds: 10
HealthyThresholdCount: 4
Matcher:
HttpCode: '200'
TargetGroupAttributes:
- Key: deregistration_delay.timeout_seconds
Value: '20'
UnhealthyThresholdCount: 3
VpcId: !Ref vpcId
LoadBalancerListener:
Type: 'AWS::ElasticLoadBalancingV2::Listener'
DependsOn:
- TargetGroup
Properties:
DefaultActions:
- Type: forward
TargetGroupArn: !Ref TargetGroup
LoadBalancerArn: !Ref LoadBalancer
Port: 80
Protocol: HTTP
EcsCluster:
Type: 'AWS::ECS::Cluster'
DependsOn:
- LoadBalancerListener
Properties:
ClusterName: ecs-core-ci-grpc
EcsTaskRole:
Type: 'AWS::IAM::Role'
Properties:
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service:
# - ecs.amazonaws.com
- ecs-tasks.amazonaws.com
Action:
- 'sts:AssumeRole'
Path: /
Policies:
- PolicyName: iam-policy-ecs-task-core-ci-grpc
PolicyDocument:
Statement:
- Effect: Allow
Action:
- 'ecr:**'
Resource: '*'
CoreGrcpTaskDefinition:
Type: 'AWS::ECS::TaskDefinition'
DependsOn:
- EcsCluster
- EcsTaskRole
Properties:
NetworkMode: awsvpc
RequiresCompatibilities:
- FARGATE
ExecutionRoleArn: !Ref EcsTaskRole
Cpu: '1024'
Memory: '2048'
ContainerDefinitions:
- Name: container-core-ci-grpc
Image: 'nginx:latest'
Cpu: '256'
Memory: '1024'
PortMappings:
- ContainerPort: '80'
HostPort: '80'
Essential: 'true'
EcsService:
Type: 'AWS::ECS::Service'
DependsOn:
- CoreGrcpTaskDefinition
Properties:
Cluster: !Ref EcsCluster
LaunchType: FARGATE
DesiredCount: '1'
DeploymentConfiguration:
MaximumPercent: 150
MinimumHealthyPercent: 0
LoadBalancers:
- ContainerName: container-core-ci-grpc
ContainerPort: '80'
TargetGroupArn: !Ref TargetGroup
NetworkConfiguration:
AwsvpcConfiguration:
AssignPublicIp: DISABLED
SecurityGroups:
- !Ref CoreGrcpInstanceSecurityGroupOpenWeb
Subnets:
- subnet-34b92250
- subnet-82d85af4
- subnet-ca379b93
TaskDefinition: !Ref CoreGrcpTaskDefinition

Unfortunately AWS Fargate only supports images hosted in ECR or public repositories in Docker Hub and does not support private repositories which are hosted in Docker Hub. For more info - https://forums.aws.amazon.com/thread.jspa?threadID=268415
Even we faced the same problem using AWS Fargate couple of months back. You have only two options right now:
Migrate your images to Amazon ECR.
Use AWS Batch with custom AMI, where the custom AMI is built with Docker Hub credentials in ECS config (which we are using right now).
Edit: As mentioned by Christopher Thomas in the comment, ECS fargate now supports pulling images from DockerHub Private repositories. More info on how to set it up can be found here.

Do define this policy in your ECR registry and attach the IAM role with your task.
{
"Version": "2008-10-17",
"Statement": [
{
"Sid": "new statement",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::99999999999:role/ecsEventsRole"
},
"Action": [
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"ecr:BatchCheckLayerAvailability",
"ecr:PutImage",
"ecr:InitiateLayerUpload",
"ecr:UploadLayerPart",
"ecr:CompleteLayerUpload"
]
}
]
}

Related

CloudFormation - How to add the bootsrap parameter to Ksql Server

I'm using CloudFormation scripts to build an EC2 container of Ksql Server (Docker container). I have already built the other components within MSK I.e Bootstrap servers and listeners.
Within the AWS::ECS::TaskDefinition I have tried to add the bootstrap servers and listeners by using the 'Container' & 'Environment' properties within 'ContainerDefinition'. Although doing this puts the EcsService in a stuck position as the status stays as CREATE_IN_PROGRESS.
# Creating the ECS Task for KsqlDB
EcsKsqlTask:
Type: AWS::ECS::TaskDefinition
Properties:
NetworkMode: awsvpc
Cpu: '256'
Memory: '1024'
RequiresCompatibilities:
- EC2
ContainerDefinitions:
- Name: KsqlServer
Image: 123.dkr.ecr.eu-west-2.amazonaws.com/confluentinc/cp-ksql-server
Essential: true
# Environment:
# Name: KSQL_BOOTSTRAP_SERVERS
# Value: b-1.kafka.123.d1.eu-west-2.amazonaws.com:9092
Command:
- 'bin/bash docker run -d \ -v / KSQL_BOOTSTRAP_SERVERS=b-1.kafka.123.c3.eu-west-2.amazonaws.com:9092 \ -e KSQL_KSQL_SERVICE_ID=ksql_standalone_1_ \ -e KSQL_KSQL_QUERIES_FILE=/path/in/container/queries.sql \ confluentinc/ksqldb-server:0.26.0'
PortMappings:
- ContainerPort: 8080
Protocol: tcp
- ContainerPort: 22
Protocol: tcp
ExecutionRoleArn: !Ref EcsRole
TaskRoleArn: !Ref EcsRole
# Creating the ECS Service for KsqlDB
EcsService:
Type: AWS::ECS::Service
Properties:
ServiceName: EcsKsqlService
TaskDefinition: !Ref EcsKsqlTask
Cluster: !Ref EcsCluster
LaunchType: EC2
NetworkConfiguration:
AwsvpcConfiguration:
AssignPublicIp: DISABLED
SecurityGroups:
- !Ref EcsSecurityGroup
Subnets:
- !Ref PrivateSubnetOne
- !Ref PrivateSubnetTwo
Any help on any property I am missing would be greatly appreciated!
Added as so
ContainerDefinitions:
- Name: KsqlCli
Image: Images/ksql-cli
Essential: true
Environment:
- Name: KSQL_BOOTSTRAP_SERVERS
Value: b-3.boostrap.amazonaws.com
- Name: KSQL_KSQL_SERVICE_ID
Value: confluent_ksql_01
- Name: KSQL_LISTENERS
Value: http://localhost:8088

How to correctly add the EnviromentFIle property to an ECS Container Definition in CloudFormation

I am trying to define an ECs cluster deployment using CLoudFormation. So far I have been successful with defining and executing the template.
I decided to externalize the environment variables for the container by using the EnvironmentFile property in the AWS::ECS::TaskDefinition resource.
I think I'm using the correct syntax according to the documentation:
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ecs-taskdefinition-containerdefinitions.html
However running the template in CF generates an error, telling me that the keys I'm using for the EnviromentFile definition are not permitted.
The most strange thing is that the stack update since to complete successfully and I can see the property when I look at the task definition in the console. Is this an error I should ignore or Is there a more correct way to define these property
CloudFormation snippet:
TaskDefinition:
Type: AWS::ECS::TaskDefinition
Properties:
Family: !Ref 'ServiceName'
Cpu: !Ref 'ContainerCpu'
Memory: !Ref 'ContainerMemory'
NetworkMode: awsvpc
RequiresCompatibilities:
- FARGATE
ExecutionRoleArn: !Ref 'ECSTaskExecutionRole'
TaskRoleArn:
Fn::If:
- 'HasCustomRole'
- !Ref 'Role'
- !Ref "AWS::NoValue"
ContainerDefinitions:
- Name: !Ref 'ServiceName'
Cpu: !Ref 'ContainerCpu'
Memory: !Ref 'ContainerMemory'
Image: !Ref 'ImageUrl'
EnvironmentFiles:
- value: !Ref EnvFile
type: s3
PortMappings:
- ContainerPort: !Ref 'ContainerPort'
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group: !Ref ApplicationLogGroup
awslogs-region: !Ref AWS::Region
awslogs-stream-prefix: !Sub ${AWS::StackName}-ecs-service
Reported error:
Resource template validation failed for resource TaskDefinition as the template has invalid properties.
Please refer to the resource documentation to fix the template.
Properties validation failed for resource TaskDefinition with message:
#/ContainerDefinitions/0/EnvironmentFiles/0: extraneous key [type] is not permitted
#/ContainerDefinitions/0/EnvironmentFiles/0: extraneous key [value] is not permitted
Ok, I'm answering this to close it. After trying several things I realize that the value and type property were in lover case and CloudFomation enforces that the properties need to start with Uppercase. making this change removed the error
EnvironmentFiles:
- Value: !Ref EnvFile
Type: s3

AWS elasticsearch service with open access

I have this template that was working till February.
https://datameetgeobk.s3.amazonaws.com/cftemplates/EyeOfCustomer_updated.yaml.txt
Something related to Fine Grained access changed and I get the error...
Enable fine-grained access control or apply a restrictive access
policy to your domain (Service: AWSElasticsearch; Status Code: 400;
Error Code: ValidationException
This is just a test server and I do not want to protect it using Advanced security options.
The error you receive is because Amazon enabled the fine grained access control as part of its release in February 2020.
You can enable VPCOptions for the cluster and create a subnet + security group and allow access through that security group. Add VPC ID as a parameter say pVpc (default VPC in thise case)
Add vpc parameter
pVpc:
Description: VPC ID
Type: String
Default: default-xxadssad - your default vpc id
Add subnet & security group
ESSubnetA:
Type: AWS::EC2::Subnet
Properties:
VpcId:
Ref: !Ref pVpc
AvailabilityZone: ${self:provider.region}a
CidrBlock: !Ref pVpcCIDR
Tags:
- Key: Name
Value: es-subneta
ESSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: SecurityGroup for Elasticsearch
VpcId:
Ref: !Ref pVpc
SecurityGroupIngress:
- FromPort: '443'
IpProtocol: tcp
ToPort: '443'
CidrIp: 0.0.0.0/0
Tags:
- Key: Name
Value: es-sg
Enable VPCOptions
VPCOptions:
SubnetIds:
- !Ref ESSubnetA
SecurityGroupIds:
- !Ref ESSecurityGroup

504 Gateway Timeout using Application Load Balancer in ECS

Deploying a Laravel web application on ECS, in order to enable autoscaling I am using an Application Load Balancer. The application worked (and scaled) perfectly until I introduced a heavy weight page, where I started to get 504 Gateway Timeout errors after a minute or so.
I am pretty sure the single web server has a higher timeout (this never happens when the application is tested in local) so the problem must be related to something related to AWS environment (ECS / ALB).
Below you can find a snipped of the ALB setting
AdminLoadBalancer:
Type: AWS::ElasticLoadBalancingV2::LoadBalancer
Properties:
SecurityGroups:
- !Ref 'AlbSecurityGroup'
Subnets:
- !Ref 'PublicSubnetAz1'
- !Ref 'PublicSubnetAz2'
Scheme: internet-facing
Name: !Join ['-', [!Ref 'AWS::StackName', 'lb']]
After some attempts, I solved the issue setting the idle timeout attribute of the load balancer, as explained here in theory, because nothing was wrong with the single ECS Tasks. In Cloudformation, it was enough to add the attribute setting of the parameter, and double the default value.
AdminLoadBalancer:
Type: AWS::ElasticLoadBalancingV2::LoadBalancer
Properties:
LoadBalancerAttributes:
- Key: 'idle_timeout.timeout_seconds'
Value: 120
SecurityGroups:
- !Ref 'AlbSecurityGroup'
Subnets:
- !Ref 'PublicSubnetAz1'
- !Ref 'PublicSubnetAz2'
Scheme: internet-facing
Name: !Join ['-', [!Ref 'AWS::StackName', 'lb']]

How nested lists works or append to a list in Cloudformation?

I want to refer the security that is getting created in the stack itself. I am trying this but nothing gets worked. Can someone help me out.
Parameters:
env:
Default: qa
Type: String
Here are the mappings
Mappings:
envMap:
qa:
securityGroups: 'sg-xxxxxxxx,sg-xxxxxxxx'
sub:
subnets: 'subnet-xxxxxxxx,subnet-xxxxxxxx'
I am creating Security Group and also want to map existing security groups as well.
Resources:
InstanceSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Allow http to client host
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 80
ToPort: 80
CidrIp: 0.0.0.0/0
LoadBalancer:
Type: 'AWS::ElasticLoadBalancingV2::LoadBalancer'
Properties:
SecurityGroups: !Split
- ','
- !Sub
- '!Ref InstanceSecurityGroup,${mappedGroup}'
- mappedGroup: !FindInMap
- envMap
- !Ref env
- securityGroups
Subnets: !Split
- ','
- !FindInMap
- envMap
- sub
- subnets