ECS and Application Load Balancer - aws-cloudformation

Ive been looking for some information on Cloud Formation with regards to creating a stack with ECS and ELB (Application Load Balancer) but unable to do so.
I have created two Docker images each containing a Node.js microservice that listens on ports 3000 and 4000. How do I go about creating my stack with ECS and ELB as mentioned ? I assume the Application Load Balancer can be configured to listen to both these ports ?
A sample Cloud Formation template would really help.

The Application Load Balancer can be used to load traffic across the ECS tasks in your service(s). The Application Load Balancer has two cool features that you can leverage; dynamic port mapping (port on host is auto-assigned by ECS/Docker) allowing you to run multiple tasks for the same service on a single EC2 instance and path-based routing allowing you to route incoming requests to different services depending on patterns in the URL path.
To wire it up you need first to define a TargetGroup like this
"TargetGroupService1" : {
"Type" : "AWS::ElasticLoadBalancingV2::TargetGroup",
"Properties" : {
"Port": 10,
"Protocol": "HTTP",
"HealthCheckPath": "/service1",
"VpcId": {"Ref" : "Vpc"}
}
}
If you are using dynamic port mapping, the port specified in the target group is irrelevant since it will be overridden by the dynamically allocated port for each target.
Next you define a ListenerRule that defines the path that shall be routed to the TargetGroup:
"ListenerRuleService1": {
"Type" : "AWS::ElasticLoadBalancingV2::ListenerRule",
"Properties" : {
"Actions" : [
{
"TargetGroupArn" : {"Ref": "TargetGroupService1"},
"Type" : "forward"
}
],
"Conditions" : [
{
"Field" : "path-pattern",
"Values" : [ "/service1" ]
}
],
"ListenerArn" : {"Ref": "Listener"},
"Priority" : 1
}
}
Finally you associate your ECS Service with the TargetGroup. This enable ECS to automatically register your task containers as targets in the target group (with the host port that you have configured in your TaskDefinition)
"Service1": {
"Type" : "AWS::ECS::Service",
"DependsOn": [
"ListenerRuleService1"
],
"Properties" : {
"Cluster" : { "Ref" : "ClusterName" },
"DesiredCount" : 2,
"Role" : "/ecsServiceRole",
"TaskDefinition" : {"Ref":"Task1"},
"LoadBalancers": [
{
"ContainerName": "Task1",
"ContainerPort": "8080",
"TargetGroupArn" : { "Ref" : "TargetGroupService1" }
}
]
}
}
You can find more details in a blog post I have written about this, see Amazon ECS and Application Load Balancer

If you're interested in doing this via https://www.terraform.io/ here's an example for two apps that share a domain:
https://ratelim.it => the Rails App running on container port 8100
https://ratelim.it/api => the Java API running on container port 8080
This example supports http & https, and splits traffic between your apps based on the url prefix.
my_app_task.json
"portMappings": [
{
"hostPort": 0,
"containerPort": 8100,
"protocol": "tcp"
}
],
my_api_task.json
"portMappings": [
{
"hostPort": 0,
"containerPort": 8080,
"protocol": "tcp"
}
],
Terraform code:
## ALB for both
resource "aws_alb" "app-alb" {
name = "app-alb"
security_groups = [
"${aws_security_group.albs.id}"]
}
## ALB target for app
resource "aws_alb_target_group" "my_app" {
name = "my_app"
port = 80
protocol = "HTTP"
vpc_id = "${aws_vpc.myvpc.id}"
deregistration_delay = 30
health_check {
protocol = "HTTP"
path = "/healthcheck"
healthy_threshold = 2
unhealthy_threshold = 2
interval = 90
}
}
## ALB Listener for app
resource "aws_alb_listener" "my_app" {
load_balancer_arn = "${aws_alb.app-alb.id}"
port = "80"
protocol = "HTTP"
default_action {
target_group_arn = "${aws_alb_target_group.my_app.id}"
type = "forward"
}
}
## ALB Listener for app https
resource "aws_alb_listener" "my_app_https" {
load_balancer_arn = "${aws_alb.app-alb.id}"
port = "443"
protocol = "HTTPS"
ssl_policy = "ELBSecurityPolicy-2015-05"
certificate_arn = "${data.aws_acm_certificate.my_app.arn}"
default_action {
target_group_arn = "${aws_alb_target_group.my_app.id}"
type = "forward"
}
}
## ALB Target for API
resource "aws_alb_target_group" "my_api" {
name = "myapi"
port = 80
protocol = "HTTP"
vpc_id = "${aws_vpc.myvpc.id}"
deregistration_delay = 30
health_check {
path = "/api/v1/status"
healthy_threshold = 2
unhealthy_threshold = 2
interval = 90
}
}
## ALB Listener Rule for API
resource "aws_alb_listener_rule" "api_rule" {
listener_arn = "${aws_alb_listener.my_app.arn}"
priority = 100
action {
type = "forward"
target_group_arn = "${aws_alb_target_group.my_api.arn}"
}
condition {
field = "path-pattern"
values = [
"/api/*"]
}
}
## ALB Listener RUle for API HTTPS
resource "aws_alb_listener_rule" "myapi_rule_https" {
listener_arn = "${aws_alb_listener.app_https.arn}"
priority = 100
action {
type = "forward"
target_group_arn = "${aws_alb_target_group.myapi.arn}"
}
condition {
field = "path-pattern"
values = [
"/api/*"]
}
}
## APP Task
resource "aws_ecs_task_definition" "my_app" {
family = "my_app"
container_definitions = "${data.template_file.my_app_task.rendered}"
}
## App Service
resource "aws_ecs_service" "my_app-service" {
name = "my_app-service"
cluster = "${aws_ecs_cluster.default.id}"
task_definition = "${aws_ecs_task_definition.my_app.arn}"
iam_role = "${aws_iam_role.ecs_role.arn}"
depends_on = [
"aws_iam_role_policy.ecs_service_role_policy"]
load_balancer {
target_group_arn = "${aws_alb_target_group.my_app.id}"
container_name = "my_app"
container_port = 8100
}
}
## API Task
resource "aws_ecs_task_definition" "myapi" {
family = "myapi"
container_definitions = "${data.template_file.myapi_task.rendered}"
}
## API Servcice
resource "aws_ecs_service" "myapi-service" {
name = "myapi-service"
cluster = "${aws_ecs_cluster.default.id}"
task_definition = "${aws_ecs_task_definition.myapi.arn}"
iam_role = "${aws_iam_role.ecs_role.arn}"
depends_on = [
"aws_iam_role_policy.ecs_service_role_policy"]
load_balancer {
target_group_arn = "${aws_alb_target_group.myapi.id}"
container_name = "myapi"
container_port = 8080
}
}

Are you trying to rebuild the entire ECS stack in CF? If you can live with pre-defined clusters, you can just register the instances with user data when they spin up (I use spot fleet, but this should work anywhere you're starting an instance). Something like this in your LaunchSpecifications:
"UserData":
{ "Fn::Base64" : { "Fn::Join" : [ "", [
"#!/bin/bash\n",
"yum update -y\n",
"echo ECS_CLUSTER=YOUR_CLUSTER_NAME >> /etc/ecs/ecs.config\n",
"yum install -y aws-cli\n",
"aws ec2 create-tags --region YOUR_REGION --resources $(curl http://169.254.169.254/latest/meta-data/instance-id) --tags Key=Name,Value=YOUR_INSTANCE_NAME\n"
]]}}
I know it's not pure Infrastructure as Code, but it gets the job done with minimal effort, and I my cluster configs don't really change a lot.

Related

Connecting to a mongodb cluster in kubernetes

I have this terraform config for creating a single mongodb replica and a service but I can't connect to mongo using the cli and the cluster domain name.
locals {
labels = {
"app" = "mongo"
}
volume_config_name = "mongo-config"
}
module "mongo" {
source = "terraform-iaac/stateful-set/kubernetes"
version = "1.4.2"
# insert the 3 required variables here
image = "mongo:4.4"
name = "mongodb"
namespace = kubernetes_namespace.cmprimg.metadata[0].name
custom_labels = local.labels
volume_host_path = [
{
volume_name = "data"
path_on_node = "/data/db"
},
]
volume_mount = [
{
mount_path = "/data/db"
volume_name = "data"
},
{
mount_path = "/etc/mongod.conf.orig"
volume_name = "mongodb-conf"
sub_path = "configfile" // Key from configmap
}
]
volume_config_map = [{
mode = "0777"
volume_name = "mongodb-conf"
name = "mongodb-confmap"
}]
# volume_claim = [
# {
# name = "mongo"
# namespace = kubernetes_namespace.cmprimg.metadata[0].name
# access_modes = ["ReadWriteOnce"]
# requests_storage = "4Gi"
# persistent_volume_name = "mongo"
# storage_class_name = "linode-block-storage-retain"
# }
# ]
env = {
"MONGO_INITDB_ROOT_USERNAME" = var.username,
"MONGO_INITDB_ROOT_PASSWORD" = var.password,
}
command = [
"mongod",
"--bind_ip",
"0.0.0.0",
]
internal_port = [
{
name = "mongo"
internal_port = 27017
}
]
resources = {
request_cpu = "100m"
request_memory = "800Mi"
limit_cpu = "120m"
limit_memory = "900Mi"
}
replicas = 1
}
module "mongo_service" {
source = "terraform-iaac/service/kubernetes"
version = "1.0.4"
# insert the 3 required variables here
app_name = module.mongo.name
app_namespace = kubernetes_namespace.cmprimg.metadata[0].name
port_mapping = [
{
name = "mongo"
internal_port = 27107
external_port = 27017
}
]
custom_labels = local.labels
}
resource "kubernetes_persistent_volume_claim" "example" {
metadata {
name = "mongo"
namespace = kubernetes_namespace.cmprimg.metadata[0].name
labels = local.labels
}
spec {
access_modes = ["ReadWriteOnce"]
resources {
requests = {
storage = "20Gi"
}
}
storage_class_name = "linode-block-storage-retain"
}
}
resource "kubernetes_config_map" "mongodb_conf" {
metadata {
name = "mongodb-confmap"
namespace = kubernetes_namespace.cmprimg.metadata[0].name
labels = local.labels
}
data = {
"configfile" = yamlencode({
storage : {
dbPath : "/data/db",
},
net : {
port : 27017,
bindIp : "0.0.0.0",
}
})
}
}
I can exec into the mongodb pod and use mongo cli to connect using localhost, but when I'm in the same pod and use mongocli to connect using the domain name mongodb.default.svc.cluster.local:27017 I get connection refused. I can see in the logs that mongodb binds to 0.0.0.0 but can't connect through external ports. Did I misconfigure the service or do something else wrong?
Take a closer look at this section:
port_mapping = [
{
name = "mongo"
internal_port = 27107
external_port = 27017
}
]
You use „internal” and „external” port numbers inconsistently across file (the internal here is 27107)
Are you sure the syntax of this block is correct? In the reaserch I made through the Internet the portMappings section has usually different syntax (e.g. https://github.com/hashicorp/terraform-provider-aws/issues/21861)
.
portMappings = [
{
containerPort = var.container_port
hostPort = var.container_port
protocol = "tcp"
}
]

How to reach PostgresCluster in Azure Cloud with a client from local?

I want to access my postgres cluster within my kubernetes within azure cloud with a client (e.g. pgadmin) to search manuelly through data.
At the moment my complete cluster only has 1 ingress that is pointing to a self written api gateway.
I found a few ideas online and tried to add a load balancer in kubernetesd without success.
My postgress cluster in terraform:
resource "helm_release" "postgres-cluster" {
name = "postgres-cluster"
repository = "https://charts.bitnami.com/bitnami"
chart = "postgresql-ha"
namespace = var.kube_namespace
set {
name = "global.postgresql.username"
value = var.postgresql_username
}
set {
name = "global.postgresql.password"
value = var.postgresql_password
}
}
Results in a running cluster:
Now my try to add a load balancer:
resource "kubernetes_manifest" "postgresql-loadbalancer" {
manifest = {
"apiVersion" = "v1"
"kind" = "Service"
"metadata" = {
"name" = "postgres-db-lb"
"namespace" = "${var.kube_namespace}"
}
"spec" = {
"selector" = {
"app.kubernetes.io/name" = "postgresql-ha"
}
"type" = "LoadBalancer"
"ports" = [{
"port" = "5432"
"targetPort" = "5432"
}]
}
}
}
Will result in:
But still no success if I try to connect to the external IP and Port:
Found the answer - it was an internal Firewall I was never thinking of. The code is absolutly correct, a loadbalancer do work here.

CannotPullContainerError: failed to extract layer

I'm trying to run a task on a windows container in fargate mode on aws
The container is a .net console application (Fullframework 4.5)
This is the task definition generated programmatically by SDK
var taskResponse = await ecsClient.RegisterTaskDefinitionAsync(new Amazon.ECS.Model.RegisterTaskDefinitionRequest()
{
RequiresCompatibilities = new List<string>() { "FARGATE" },
TaskRoleArn = TASK_ROLE_ARN,
ExecutionRoleArn = EXECUTION_ROLE_ARN,
Cpu = CONTAINER_CPU.ToString(),
Memory = CONTAINER_MEMORY.ToString(),
NetworkMode = NetworkMode.Awsvpc,
Family = "netfullframework45consoleapp-task-definition",
EphemeralStorage = new EphemeralStorage() { SizeInGiB = EPHEMERAL_STORAGE_SIZE_GIB },
ContainerDefinitions = new List<Amazon.ECS.Model.ContainerDefinition>()
{
new Amazon.ECS.Model.ContainerDefinition()
{
Name = "netfullframework45consoleapp-task-definition",
Image = "XXXXXXXXXX.dkr.ecr.eu-west-1.amazonaws.com/netfullframework45consoleapp:latest",
Cpu = CONTAINER_CPU,
Memory = CONTAINER_MEMORY,
Essential = true
//I REMOVED THE LOG DEFINITION TO SIMPLIFY THE PROBLEM
//,
//LogConfiguration = new Amazon.ECS.Model.LogConfiguration()
//{
// LogDriver = LogDriver.Awslogs,
// Options = new Dictionary<string, string>()
// {
// { "awslogs-create-group", "true"},
// { "awslogs-group", $"/ecs/{TASK_DEFINITION_NAME}" },
// { "awslogs-region", AWS_REGION },
// { "awslogs-stream-prefix", $"{TASK_DEFINITION_NAME}" }
// }
//}
}
}
});
these are the role policies contained used by the task AmazonECSTaskExecutionRolePolicy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "*"
}
]
}
i got this error when lunch the task
CannotPullContainerError: ref pull has been retried 1 time(s): failed to extract layer sha256:fe48cee89971abac42eedb9110b61867659df00fc5b0b90dd91d6e19f704d935: link /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/212/fs/Files/ProgramData/Microsoft/Event Viewer/Views/ServerRoles/RemoteDesktop.Events.xml /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/212/fs/Files/Windows/Microsoft.NET/assembly/GAC_64/Microsoft.Windows.ServerManager.RDSPlugin/v4.0_10.0.0.0__31bf3856ad364e35/RemoteDesktop.Events.xml: no such file or directory: unknown
some search drived me here:
https://aws.amazon.com/it/premiumsupport/knowledge-center/ecs-pull-container-api-error-ecr/
the point 1 says that if i run the task on the private subnet (like i'm doing) i need a NAT with related route to garantee the communication towards the ECR, but
note that in my infrastructure i've a VPC Endpoint to the ECR....
so the first question is: is a VPC Endpoint sufficent to garantee the comunication from the container to the container images registry(ECR)? or i need necessarily to implement what the point 1 say (NAT and route on the route table) or eventually run the task on a public subnet?
Can be the error related to the missing communication towards the ECR, or could be a missing policy problem?
Make sure your VPC endpoint is configured correctly. Note that
"Amazon ECS tasks hosted on Fargate using platform version 1.4.0 or later require both the com.amazonaws.region.ecr.dkr and com.amazonaws.region.ecr.api Amazon ECR VPC endpoints as well as the Amazon S3 gateway endpoint to take advantage of this feature."
See https://docs.aws.amazon.com/AmazonECR/latest/userguide/vpc-endpoints.html for more information
In the first paragraph of the page I linked: "You don't need an internet gateway, a NAT device, or a virtual private gateway."

Grafana 7.4.3 /var/lib/grafana not writeable in AWS ECS - EFS

I'm trying to host a Grafana 7.4 image in ECS Fargate using an EFS volume for persistent storage.
Using Terraform I have created the required resource and given the task access to the EFS volume via an "access point"
resource "aws_efs_access_point" "default" {
file_system_id = aws_efs_file_system.default.id
posix_user {
gid = 0
uid = 472
}
root_directory {
path = "/opt/data"
creation_info {
owner_gid = 0
owner_uid = 472
permissions = "600"
}
}
}
Note that I have set owner permissions as per the guides in https://grafana.com/docs/grafana/latest/installation/docker/#migrate-from-previous-docker-containers-versions (I've tried both group id 0 and 1 as the documentation seems to be inconsistent on the gid).
Using a base alpine image in place of the grafana image I've confirmed the directory /var/lib/grafana exists within container with the correct uid and gids set. However on attempting to run the grafana image I get the error message
GF_PATHS_DATA='/var/lib/grafana' is not writable.
I am launching the task with a terraformed task definition.
resource "aws_ecs_task_definition" "default" {
family = "${var.name}"
container_definitions = "${data.template_file.container_definition.rendered}"
memory = "${var.memory}"
cpu = "${var.cpu}"
requires_compatibilities = [
"FARGATE"
]
network_mode = "awsvpc"
execution_role_arn = "arn:aws:iam::REDACTED_ID:role/ecsTaskExecutionRole"
volume {
name = "${var.name}-volume"
efs_volume_configuration {
file_system_id = aws_efs_file_system.default.id
transit_encryption = "ENABLED"
root_directory = "/opt/data"
authorization_config {
access_point_id = aws_efs_access_point.default.id
}
}
}
tags = {
Product = "${var.name}"
}
}
With the container definition
[
{
"portMappings": [
{
"hostPort": 80,
"protocol": "tcp",
"containerPort": 80
}
],
"mountPoints": [
{
"sourceVolume": "${volume_name}",
"containerPath": "/var/lib/grafana",
"readOnly": false
}
],
"cpu": 0,
"secrets": [
...
],
"environment": [],
"image": "grafana/grafana:7.4.3",
"name": "${name}",
"user": "472:0"
}
]
For "user" I have tried "grafana", "742:0", "742" and "742:1" when trying gid 1.
I believe the terraform, security groups, mount_targets, etc... are all correct as I can get an alpine image to:
ls -lash /var/lib
> drw------- 2 472 root 6.0K Mar 12 11:22 grafana
I believe you have a problem, because AWS ECS https://github.com/aws/containers-roadmap/issues/938
Anyway, file system approach doesn't seems to be very cloud friendly (especially if you want to scale horizontally: problems with concurrent writes from multiple tasks, IOPs limitations, ...). Just provision proper DB (e.g. Aurora RDS Mysql, multi A-Z cluster if you need HA) and you will have nice opsless AWS deployment.

PowerShell script with parameter for Windows VM instance on Google Cloud Platform

I am trying to deploy Windows VM on Google Cloud through terraform. The VM is getting deployed and I am able to execute PowerShell scripts by using windows-startup-script-url.
With this approach, I can only use scripts which are already stored in Google Storage. If the script has parameters / variables, then how to pass that parameter, any clue !
provider "google" {
project = "my-project"
region = "my-location"
zone = "my-zone"
}
resource "google_compute_instance" "default" {
name = "my-name"
machine_type = "n1-standard-2"
zone = "my-zone"
boot_disk {
initialize_params {
image = "windows-cloud/windows-2019"
}
}
metadata {
windows-startup-script-url = "gs://<my-storage>/<my-script.ps1>"
}
network_interface {
network = "default"
access_config {
}
}
tags = ["http-server", "windows-server"]
}
resource "google_compute_firewall" "http-server" {
name = "default-allow-http"
network = "default"
allow {
protocol = "tcp"
ports = ["80"]
}
source_ranges = ["0.0.0.0/0"]
target_tags = ["http-server"]
}
resource "google_compute_firewall" "windows-server" {
name = "windows-server"
network = "default"
allow {
protocol = "tcp"
ports = ["3389"]
}
source_ranges = ["0.0.0.0/0"]
target_tags = ["windows-server"]
}
output "ip" {
value = "${google_compute_instance.default.network_interface.0.access_config.0.nat_ip}"
}
Terraform doesn't require startup scripts to be pulled from GCS buckets necessarily.
The example here shows:
}
metadata = {
foo = "bar"
}
metadata_startup_script = "echo hi > /test.txt"
service_account {
scopes = ["userinfo-email", "compute-ro", "storage-ro"]
}
}
More in Official docs for GCE and Powershell scripting here