Does ECS with Fargate support Detailed Monitoring? If so, how do I enable it for a service in Terraform? - amazon-ecs

I'd like to enable 1-minute resolution of the CloudWatch metrics ECSServiceAverageMemoryUtilization and ECSServiceAverageCPUUtilization. I understand that by default, they are only updated every 5 minutes unless you enabled Detailed Monitoring, but I'm unclear on how to enable this.
I googled around and had no luck, only finding descriptions of what Detailed Monitoring is, how much it costs, etc.
I looked through both these documents:
https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/ecs_task_definition
https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/ecs_service
but see no mention of it.
Is this maybe an account setting instead of an IaaC setting?

The closest thing to what you are looking for is ECS Container Insights.
You enable that at the ECS cluster level in Terraform:
resource "aws_ecs_cluster" "cluster" {
name = "my-cluster"
setting {
name = "containerInsights"
value = "enabled"
}
}

Related

CloudFormation: stack is stuck, CloudTrail events shows repeating DeleteNetworkInterface event

I am deploying a stack with CDK. It gets stuck in CREATE_IN_PROGRESS. CloudTrail logs show repeating events in logs:
DeleteNetworkInterface
CreateLogStream
What should I look at next to continue debugging? Is there a known reason for this to happen?
I also saw the exact same issue with the deployment of a CDK-based ECS/Fargate Deployment
In my instance, I was able to diagnose the issue by following the content from the AWS support article https://aws.amazon.com/premiumsupport/knowledge-center/cloudformation-stack-stuck-progress/
What specifically diagnosed and then resolved it for me:-
I updated my ECS service to set the desired task count of the ECS Service to 0. At that point the Cloud Formation stack did complete successfully.
From that, it became obvious that the actual issue was related to the creation of the initial task for my ECS Service. I was able to diagnose that by reviewing the output in Deployment and Events Tab of the ECS Service in the AWS Management Console. In my case, the task creation was failing because of an issue with accessing the associated ECR repository. Obviously there could be other reasons but they should show-up there.

How do you deploy GeoIP on ECS Fargate?

How to productionise https://hub.docker.com/r/fiorix/freegeoip such that it is launched as a Fargate task and Also how to take care of the geoipupdate functionality such that the GeoLite2-City.mmdb is updated in the task.
I have the required environment details like GEOIPUPDATE_ACCOUNT_ID, GEOIPUPDATE_LICENSE_KEY and GEOIPUPDATE_EDITION_IDS but could not understand the flow for deployment as there are two separate dockerfile/images for geoip as well as geoipupdate.
Has someone deployed this on Fargate? If yes could you please list down with the high level steps for the same. I have already tried researching if such a thing is deployed on ECS, but I can only find examples for Lambda and EC2.
Thanks

Why does ECS integration new-relic task require AmazonEC2ContainerServiceforEC2Role?

We are trying to use the AWS Cloudformation way of installing the ECS integration for our clusers with NewRelic as described in this link
I observed that this cloud formation first creates few IAM roles for Task that will be executed as daemon service and one of the roles on the Task created is AmazonEC2ContainerServiceforEC2Role , which includes permissions to operated with Container Instances, including Deregistering the Container Instance.
I am interested to understand under what circumstances will this daemon task required to Deregister instance or for that matter Create cluster or register instance. The complete list of permissions given by IAM are as below. Can someone please elaborated why would we need this in first place.
Tried putting this in newrelic discussion forums but havent had any luck yet
"ec2:DescribeTags", "ecs:CreateCluster", "ecs:DeregisterContainerInstance", "ecs:DiscoverPollEndpoint", "ecs:Poll", "ecs:RegisterContainerInstance", "ecs:StartTelemetrySession", "ecs:UpdateContainerInstancesState", "ecs:Submit*", "ecr:GetAuthorizationToken", "ecr:BatchCheckLayerAvailability", "ecr:GetDownloadUrlForLayer", "ecr:BatchGetImage", "logs:CreateLogStream", "logs:PutLogEvents"

Kubernetes: Policy check before container execution

I am new to Kubernetes, I am looking to see if its possible to hook into the container execution life cycle events in the orchestration process so that I can call an API to pass the details of the container and see if its allowed to execute this container in the given environment, location etc.
An example check could be: container can only be run in a Europe or US data centers. so before someone tries to execute this container, outside this region data centers, it should not be allowed.
Is this possible and what is the best way to achieve this?
You can possibly set up an ImagePolicy admission controller in the clusters, were you describes from what registers it is allowed to pull images.
kube-image-bouncer is an example of an ImagePolicy admission controller
A simple webhook endpoint server that can be used to validate the images being created inside of the kubernetes cluster.
If you don't want to start from scratch...there is a Cloud Native Computing Foundation (incubating) project - Open Policy Agent with support for Kubernetes that seems to offer what you want. (I am not affiliated with the project)

Query Stackdriver Uptime Checks

I am trying to query for the Stackdriver Uptime Checks using the google monitoring api. I cannot seem to find anything in their documentation that illustrates how to query for the uptime checks that were set up on stackdriver. Here are some of the docs I have been reading through. You will note that some of the query-able metrics include agent.googleapis.com/agent/uptime but this does not return the uptime checks seen on Stackdriver Uptime Checks. Below I am listing some of the documentation I have been sifting through in case it may be helpful.
Does anyone know how/if this can be done?
Google Python Client Docs
Time Series Query
Metrics
I'm a product manager on the Stackdriver team. Unfortunately, Uptime Check metrics are not currently available via the Stackdriver Metrics API. This is a feature we're actively working to provide. I'll follow-up on this thread when the feature is released.
Thank you for your question and for using Stackdriver!
It's my understanding that this metric can now be externally queried as:
monitoring.googleapis.com/uptime_check/check_passed
You can see it referenced in the sample alerting policy JSON for creating uptime check alerting policies.