Grafana 7.4.3 /var/lib/grafana not writeable in AWS ECS - EFS - grafana

I'm trying to host a Grafana 7.4 image in ECS Fargate using an EFS volume for persistent storage.
Using Terraform I have created the required resource and given the task access to the EFS volume via an "access point"
resource "aws_efs_access_point" "default" {
file_system_id = aws_efs_file_system.default.id
posix_user {
gid = 0
uid = 472
}
root_directory {
path = "/opt/data"
creation_info {
owner_gid = 0
owner_uid = 472
permissions = "600"
}
}
}
Note that I have set owner permissions as per the guides in https://grafana.com/docs/grafana/latest/installation/docker/#migrate-from-previous-docker-containers-versions (I've tried both group id 0 and 1 as the documentation seems to be inconsistent on the gid).
Using a base alpine image in place of the grafana image I've confirmed the directory /var/lib/grafana exists within container with the correct uid and gids set. However on attempting to run the grafana image I get the error message
GF_PATHS_DATA='/var/lib/grafana' is not writable.
I am launching the task with a terraformed task definition.
resource "aws_ecs_task_definition" "default" {
family = "${var.name}"
container_definitions = "${data.template_file.container_definition.rendered}"
memory = "${var.memory}"
cpu = "${var.cpu}"
requires_compatibilities = [
"FARGATE"
]
network_mode = "awsvpc"
execution_role_arn = "arn:aws:iam::REDACTED_ID:role/ecsTaskExecutionRole"
volume {
name = "${var.name}-volume"
efs_volume_configuration {
file_system_id = aws_efs_file_system.default.id
transit_encryption = "ENABLED"
root_directory = "/opt/data"
authorization_config {
access_point_id = aws_efs_access_point.default.id
}
}
}
tags = {
Product = "${var.name}"
}
}
With the container definition
[
{
"portMappings": [
{
"hostPort": 80,
"protocol": "tcp",
"containerPort": 80
}
],
"mountPoints": [
{
"sourceVolume": "${volume_name}",
"containerPath": "/var/lib/grafana",
"readOnly": false
}
],
"cpu": 0,
"secrets": [
...
],
"environment": [],
"image": "grafana/grafana:7.4.3",
"name": "${name}",
"user": "472:0"
}
]
For "user" I have tried "grafana", "742:0", "742" and "742:1" when trying gid 1.
I believe the terraform, security groups, mount_targets, etc... are all correct as I can get an alpine image to:
ls -lash /var/lib
> drw------- 2 472 root 6.0K Mar 12 11:22 grafana

I believe you have a problem, because AWS ECS https://github.com/aws/containers-roadmap/issues/938
Anyway, file system approach doesn't seems to be very cloud friendly (especially if you want to scale horizontally: problems with concurrent writes from multiple tasks, IOPs limitations, ...). Just provision proper DB (e.g. Aurora RDS Mysql, multi A-Z cluster if you need HA) and you will have nice opsless AWS deployment.

Related

CannotPullContainerError: failed to extract layer

I'm trying to run a task on a windows container in fargate mode on aws
The container is a .net console application (Fullframework 4.5)
This is the task definition generated programmatically by SDK
var taskResponse = await ecsClient.RegisterTaskDefinitionAsync(new Amazon.ECS.Model.RegisterTaskDefinitionRequest()
{
RequiresCompatibilities = new List<string>() { "FARGATE" },
TaskRoleArn = TASK_ROLE_ARN,
ExecutionRoleArn = EXECUTION_ROLE_ARN,
Cpu = CONTAINER_CPU.ToString(),
Memory = CONTAINER_MEMORY.ToString(),
NetworkMode = NetworkMode.Awsvpc,
Family = "netfullframework45consoleapp-task-definition",
EphemeralStorage = new EphemeralStorage() { SizeInGiB = EPHEMERAL_STORAGE_SIZE_GIB },
ContainerDefinitions = new List<Amazon.ECS.Model.ContainerDefinition>()
{
new Amazon.ECS.Model.ContainerDefinition()
{
Name = "netfullframework45consoleapp-task-definition",
Image = "XXXXXXXXXX.dkr.ecr.eu-west-1.amazonaws.com/netfullframework45consoleapp:latest",
Cpu = CONTAINER_CPU,
Memory = CONTAINER_MEMORY,
Essential = true
//I REMOVED THE LOG DEFINITION TO SIMPLIFY THE PROBLEM
//,
//LogConfiguration = new Amazon.ECS.Model.LogConfiguration()
//{
// LogDriver = LogDriver.Awslogs,
// Options = new Dictionary<string, string>()
// {
// { "awslogs-create-group", "true"},
// { "awslogs-group", $"/ecs/{TASK_DEFINITION_NAME}" },
// { "awslogs-region", AWS_REGION },
// { "awslogs-stream-prefix", $"{TASK_DEFINITION_NAME}" }
// }
//}
}
}
});
these are the role policies contained used by the task AmazonECSTaskExecutionRolePolicy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "*"
}
]
}
i got this error when lunch the task
CannotPullContainerError: ref pull has been retried 1 time(s): failed to extract layer sha256:fe48cee89971abac42eedb9110b61867659df00fc5b0b90dd91d6e19f704d935: link /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/212/fs/Files/ProgramData/Microsoft/Event Viewer/Views/ServerRoles/RemoteDesktop.Events.xml /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/212/fs/Files/Windows/Microsoft.NET/assembly/GAC_64/Microsoft.Windows.ServerManager.RDSPlugin/v4.0_10.0.0.0__31bf3856ad364e35/RemoteDesktop.Events.xml: no such file or directory: unknown
some search drived me here:
https://aws.amazon.com/it/premiumsupport/knowledge-center/ecs-pull-container-api-error-ecr/
the point 1 says that if i run the task on the private subnet (like i'm doing) i need a NAT with related route to garantee the communication towards the ECR, but
note that in my infrastructure i've a VPC Endpoint to the ECR....
so the first question is: is a VPC Endpoint sufficent to garantee the comunication from the container to the container images registry(ECR)? or i need necessarily to implement what the point 1 say (NAT and route on the route table) or eventually run the task on a public subnet?
Can be the error related to the missing communication towards the ECR, or could be a missing policy problem?
Make sure your VPC endpoint is configured correctly. Note that
"Amazon ECS tasks hosted on Fargate using platform version 1.4.0 or later require both the com.amazonaws.region.ecr.dkr and com.amazonaws.region.ecr.api Amazon ECR VPC endpoints as well as the Amazon S3 gateway endpoint to take advantage of this feature."
See https://docs.aws.amazon.com/AmazonECR/latest/userguide/vpc-endpoints.html for more information
In the first paragraph of the page I linked: "You don't need an internet gateway, a NAT device, or a virtual private gateway."

How to provision RDS postgres db users with AWS IAM auth using terraform?

By checking this AWS blog: https://aws.amazon.com/premiumsupport/knowledge-center/users-connect-rds-iam/ I noticed that I need to create a DB user after login with the master username and password:
CREATE USER {dbusername} IDENTIFIED WITH AWSAuthenticationPlugin as 'RDS';
I can see terraform has mysql_user to provision mysql db users: https://www.terraform.io/docs/providers/mysql/r/user.html
However, I couldn't find postgres_user. Is there a way to provision postgres user with IAM auth?
In Postgres, a user is called a "role". The Postgres docs say:
a role can be considered a "user", a "group", or both depending on how it is used
So, the TF resource to create is a postgresql_role
resource "postgresql_role" "my_replication_role" {
name = "replication_role"
replication = true
login = true
connection_limit = 5
password = "md5c98cbfeb6a347a47eb8e96cfb4c4b890"
}
To enable IAM user to assume the role, follow the steps in the AWS docs.
From those instructions, you would end up with TF code looking something like:
module "db" {
source = "terraform-aws-modules/rds/aws"
// ...
}
provider "postgresql" {
// ...
}
resource "postgresql_role" "pguser" {
login = true
name = var.pg_username
password = var.pg_password
roles = ["rds_iam"]
}
resource "aws_iam_user" "pguser" {
name = var.pg_username
}
resource "aws_iam_user_policy" "pguser" {
name = var.pg_username
user = aws_iam_user.pguser.id
policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"rds-db:connect"
],
"Resource": [
"arn:aws:rds-db:${var.region}:${data.aws_caller_identity.current.account_id}:dbuser:${module.db.this_db_instance_resource_id}/${var.pg_username}"
]
}
]
}
EOF
}

How to set aws cloudwatch retention via Terraform

Using Terraform to deploy API Gateway/Lambda and already have the appropriate logs in Cloudwatch. However I can't seem to find a way to set the retention on the logs via Terraform, using my currently deployed resources (below). It looks like the log group resource is where I'd do it, but not sure how to point log stream from api gateway at the new log group. I must be missing something obvious ... any advice is very much appreciated!
resource "aws_api_gateway_account" "name" {
cloudwatch_role_arn = "${aws_iam_role.cloudwatch.arn}"
}
resource "aws_iam_role" "cloudwatch" {
name = "#{name}_APIGatewayCloudWatchLogs"
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "apigateway.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
EOF
}
resource "aws_iam_policy_attachment" "api_gateway_logs" {
name = "#{name}_api_gateway_logs_policy_attach"
roles = ["${aws_iam_role.cloudwatch.id}"]
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs"
}
resource "aws_api_gateway_method_settings" "name" {
rest_api_id = "${aws_api_gateway_rest_api.name.id}"
stage_name = "${aws_api_gateway_stage.name.stage_name}"
method_path = "${aws_api_gateway_resource.name.path_part}/${aws_api_gateway_method.name.http_method}"
settings {
metrics_enabled = true
logging_level = "INFO"
data_trace_enabled = true
}
}
yes, you can use the Lambda log name to create log resource before you create the Lambda function. Or you can import the existing log groups.
resource "aws_cloudwatch_log_group" "lambda" {
name = "/aws/lambda/${var.env}-${join("", split("_",title(var.lambda_name)))}-Lambda"
retention_in_days = 7
lifecycle {
create_before_destroy = true
prevent_destroy = false
}
}

Not able to retrieve RedShift cluster Capacity details like Storage, Memory using Python script

I have tried to fetch my RedShift cluster details. I'm able to see many details about the cluster but few details got missed.
For Ex:- Details like Storageand Memory
The below is the code:-
redshiftClient = boto3.client('redshift', aws_access_key_id = role.credentials.access_key,
aws_secret_access_key = role.credentials.secret_key, aws_session_token = role.credentials.session_token, region_name='us-west-2')
#Getting all the clusters
clusters = redshiftClient.describe_clusters()
can you please check provide the way to get it.
Thanks.
The describe-clusters command does not return that type of information. The output of that command is:
{
"Clusters": [
{
"NodeType": "dw.hs1.xlarge",
"Endpoint": {
"Port": 5439,
"Address": "mycluster.coqoarplqhsn.us-east-1.redshift.amazonaws.com"
},
"ClusterVersion": "1.0",
"PubliclyAccessible": "true",
"MasterUsername": "adminuser",
"ClusterParameterGroups": [
{
"ParameterApplyStatus": "in-sync",
"ParameterGroupName": "default.redshift-1.0"
} ],
"ClusterSecurityGroups": [
{
"Status": "active",
"ClusterSecurityGroupName": "default"
} ],
"AllowVersionUpgrade": true,
"VpcSecurityGroups": \[],
"AvailabilityZone": "us-east-1a",
"ClusterCreateTime": "2013-01-22T21:59:29.559Z",
"PreferredMaintenanceWindow": "sat:03:30-sat:04:00",
"AutomatedSnapshotRetentionPeriod": 1,
"ClusterStatus": "available",
"ClusterIdentifier": "mycluster",
"DBName": "dev",
"NumberOfNodes": 2,
"PendingModifiedValues": {}
} ],
"ResponseMetadata": {
"RequestId": "65b71cac-64df-11e2-8f5b-e90bd6c77476"
}
}
You will need to retrieve Memory and Storage statistics from Amazon CloudWatch.
See your other question: Amazon CloudWatch is not returning Redshift metrics
If you actually want to retrieve information about a standard cluster (that is, the amount of storage and memory assigned to each node, rather than current memory and storage usage), that is not available from an API call. Instead see: Amazon Redshift Clusters

ECS and Application Load Balancer

Ive been looking for some information on Cloud Formation with regards to creating a stack with ECS and ELB (Application Load Balancer) but unable to do so.
I have created two Docker images each containing a Node.js microservice that listens on ports 3000 and 4000. How do I go about creating my stack with ECS and ELB as mentioned ? I assume the Application Load Balancer can be configured to listen to both these ports ?
A sample Cloud Formation template would really help.
The Application Load Balancer can be used to load traffic across the ECS tasks in your service(s). The Application Load Balancer has two cool features that you can leverage; dynamic port mapping (port on host is auto-assigned by ECS/Docker) allowing you to run multiple tasks for the same service on a single EC2 instance and path-based routing allowing you to route incoming requests to different services depending on patterns in the URL path.
To wire it up you need first to define a TargetGroup like this
"TargetGroupService1" : {
"Type" : "AWS::ElasticLoadBalancingV2::TargetGroup",
"Properties" : {
"Port": 10,
"Protocol": "HTTP",
"HealthCheckPath": "/service1",
"VpcId": {"Ref" : "Vpc"}
}
}
If you are using dynamic port mapping, the port specified in the target group is irrelevant since it will be overridden by the dynamically allocated port for each target.
Next you define a ListenerRule that defines the path that shall be routed to the TargetGroup:
"ListenerRuleService1": {
"Type" : "AWS::ElasticLoadBalancingV2::ListenerRule",
"Properties" : {
"Actions" : [
{
"TargetGroupArn" : {"Ref": "TargetGroupService1"},
"Type" : "forward"
}
],
"Conditions" : [
{
"Field" : "path-pattern",
"Values" : [ "/service1" ]
}
],
"ListenerArn" : {"Ref": "Listener"},
"Priority" : 1
}
}
Finally you associate your ECS Service with the TargetGroup. This enable ECS to automatically register your task containers as targets in the target group (with the host port that you have configured in your TaskDefinition)
"Service1": {
"Type" : "AWS::ECS::Service",
"DependsOn": [
"ListenerRuleService1"
],
"Properties" : {
"Cluster" : { "Ref" : "ClusterName" },
"DesiredCount" : 2,
"Role" : "/ecsServiceRole",
"TaskDefinition" : {"Ref":"Task1"},
"LoadBalancers": [
{
"ContainerName": "Task1",
"ContainerPort": "8080",
"TargetGroupArn" : { "Ref" : "TargetGroupService1" }
}
]
}
}
You can find more details in a blog post I have written about this, see Amazon ECS and Application Load Balancer
If you're interested in doing this via https://www.terraform.io/ here's an example for two apps that share a domain:
https://ratelim.it => the Rails App running on container port 8100
https://ratelim.it/api => the Java API running on container port 8080
This example supports http & https, and splits traffic between your apps based on the url prefix.
my_app_task.json
"portMappings": [
{
"hostPort": 0,
"containerPort": 8100,
"protocol": "tcp"
}
],
my_api_task.json
"portMappings": [
{
"hostPort": 0,
"containerPort": 8080,
"protocol": "tcp"
}
],
Terraform code:
## ALB for both
resource "aws_alb" "app-alb" {
name = "app-alb"
security_groups = [
"${aws_security_group.albs.id}"]
}
## ALB target for app
resource "aws_alb_target_group" "my_app" {
name = "my_app"
port = 80
protocol = "HTTP"
vpc_id = "${aws_vpc.myvpc.id}"
deregistration_delay = 30
health_check {
protocol = "HTTP"
path = "/healthcheck"
healthy_threshold = 2
unhealthy_threshold = 2
interval = 90
}
}
## ALB Listener for app
resource "aws_alb_listener" "my_app" {
load_balancer_arn = "${aws_alb.app-alb.id}"
port = "80"
protocol = "HTTP"
default_action {
target_group_arn = "${aws_alb_target_group.my_app.id}"
type = "forward"
}
}
## ALB Listener for app https
resource "aws_alb_listener" "my_app_https" {
load_balancer_arn = "${aws_alb.app-alb.id}"
port = "443"
protocol = "HTTPS"
ssl_policy = "ELBSecurityPolicy-2015-05"
certificate_arn = "${data.aws_acm_certificate.my_app.arn}"
default_action {
target_group_arn = "${aws_alb_target_group.my_app.id}"
type = "forward"
}
}
## ALB Target for API
resource "aws_alb_target_group" "my_api" {
name = "myapi"
port = 80
protocol = "HTTP"
vpc_id = "${aws_vpc.myvpc.id}"
deregistration_delay = 30
health_check {
path = "/api/v1/status"
healthy_threshold = 2
unhealthy_threshold = 2
interval = 90
}
}
## ALB Listener Rule for API
resource "aws_alb_listener_rule" "api_rule" {
listener_arn = "${aws_alb_listener.my_app.arn}"
priority = 100
action {
type = "forward"
target_group_arn = "${aws_alb_target_group.my_api.arn}"
}
condition {
field = "path-pattern"
values = [
"/api/*"]
}
}
## ALB Listener RUle for API HTTPS
resource "aws_alb_listener_rule" "myapi_rule_https" {
listener_arn = "${aws_alb_listener.app_https.arn}"
priority = 100
action {
type = "forward"
target_group_arn = "${aws_alb_target_group.myapi.arn}"
}
condition {
field = "path-pattern"
values = [
"/api/*"]
}
}
## APP Task
resource "aws_ecs_task_definition" "my_app" {
family = "my_app"
container_definitions = "${data.template_file.my_app_task.rendered}"
}
## App Service
resource "aws_ecs_service" "my_app-service" {
name = "my_app-service"
cluster = "${aws_ecs_cluster.default.id}"
task_definition = "${aws_ecs_task_definition.my_app.arn}"
iam_role = "${aws_iam_role.ecs_role.arn}"
depends_on = [
"aws_iam_role_policy.ecs_service_role_policy"]
load_balancer {
target_group_arn = "${aws_alb_target_group.my_app.id}"
container_name = "my_app"
container_port = 8100
}
}
## API Task
resource "aws_ecs_task_definition" "myapi" {
family = "myapi"
container_definitions = "${data.template_file.myapi_task.rendered}"
}
## API Servcice
resource "aws_ecs_service" "myapi-service" {
name = "myapi-service"
cluster = "${aws_ecs_cluster.default.id}"
task_definition = "${aws_ecs_task_definition.myapi.arn}"
iam_role = "${aws_iam_role.ecs_role.arn}"
depends_on = [
"aws_iam_role_policy.ecs_service_role_policy"]
load_balancer {
target_group_arn = "${aws_alb_target_group.myapi.id}"
container_name = "myapi"
container_port = 8080
}
}
Are you trying to rebuild the entire ECS stack in CF? If you can live with pre-defined clusters, you can just register the instances with user data when they spin up (I use spot fleet, but this should work anywhere you're starting an instance). Something like this in your LaunchSpecifications:
"UserData":
{ "Fn::Base64" : { "Fn::Join" : [ "", [
"#!/bin/bash\n",
"yum update -y\n",
"echo ECS_CLUSTER=YOUR_CLUSTER_NAME >> /etc/ecs/ecs.config\n",
"yum install -y aws-cli\n",
"aws ec2 create-tags --region YOUR_REGION --resources $(curl http://169.254.169.254/latest/meta-data/instance-id) --tags Key=Name,Value=YOUR_INSTANCE_NAME\n"
]]}}
I know it's not pure Infrastructure as Code, but it gets the job done with minimal effort, and I my cluster configs don't really change a lot.