How to deploy the kinesis-video-producer Docker image from AWS's own ECR to Fargate using CDK in TypeScript? - amazon-ecs

I'm trying to stand up a proof of concept that ingests an RTSP video stream into Kinesis Video. The provided documentation has a docker image all set up that seems to have everything I need to do this, hosted by AWS on 546150905175.dkr.ecr.us-west-2.amazonaws.com. What I am having trouble with, though, is getting that deployment (via an Amplify Custom category, in TypeScript CDK) to work.
I've tried different variations on
import * as iam from "#aws-cdk/aws-iam";
import * as ecs from "#aws-cdk/aws-ecs";
import * as ec2 from "#aws-cdk/aws-ec2";
const kinesisUserAccessKey = new iam.AccessKey(this, 'KinesisStreamUserAccessKey', {
user: kinesisStreamUser,
})
const servicePrincipal = new iam.ServicePrincipal('ecs-tasks.amazonaws.com');
const executionRole = new iam.Role(this, 'IngestVideoTaskDefExecutionRole', {
assumedBy: servicePrincipal,
managedPolicies: [
iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AmazonECSTaskExecutionRolePolicy'),
]
});
const taskDefinition = new ecs.FargateTaskDefinition(this, 'IngestVideoTaskDef', {
cpu: 512,
memoryLimitMiB: 1024,
executionRole,
})
const image = ecs.ContainerImage.fromRegistry('546150905175.dkr.ecr.us-west-2.amazonaws.com/kinesis-video-producer-sdk-cpp-amazon-linux:latest');
taskDefinition.addContainer('IngestVideoContainer', {
command: [
'gst-launch-1.0',
'rtspsrc',
`location="${locationParam.secretValue.toString()}"`,
'short-header=TRUE',
'!',
'rtph264depay',
'!',
'video/x-h264,',
'format=avc,alignment=au',
'!',
'kvssink',
`stream-name="${cfnStream.name}"`,
'storage-size=512',
`access-key="${kinesisUserAccessKey.accessKeyId}"`,
`secret-key="${kinesisUserAccessKey.secretAccessKey.toString()}"`,
`aws-region="${REGION}"`,
// `aws-region="${cdk.Aws.REGION}"`,
],
image,
logging: new ecs.AwsLogDriver({
streamPrefix: 'IngestVideoContainer',
}),
})
const service = new ecs.FargateService(this, 'IngestVideoService', {
cluster,
taskDefinition,
desiredCount: 1,
securityGroups: [
ec2.SecurityGroup.fromSecurityGroupId(this, 'DefaultSecurityGroup', SECURITY_GROUP_ID)
],
vpcSubnets: {
subnets: SUBNET_IDS.map(subnetId => ec2.Subnet.fromSubnetId(this, subnetId, subnetId)),
}
})
But it seems like regardless of what I do, an amplify push just stays in 'in progress' for like an hour until I go into the CloudFormation console and cancel the stack update, but deep in the my way to the ECS Console I managed to find an actual error message:
Resourceinitializationerror: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 3 time(s): RequestError: send request failed caused by: Post "https://api.ecr.us-west-2.amazonaws.com/": dial tcp 52.94.177.118:443: i/o timeout
It seems to be some kind of networking issue, but I'm not sure how to proceed. Any assistance you can provide would be wonderful. Cheers!

Figured it out. For those stuck with similar issues, you have to give it an execution role with AmazonECSTaskExecutionRolePolicy, which I already edited above, and set assignPublicIp: true in the service.

Related

In CDK, can I wait until a Helm-installed operator is running before applying a manifest?

I'm installing the External Secrets Operator alongside Apache Pinot into an EKS cluster using CDK. I'm running into an issue that I think is being caused by CDK attempting to create a resource defined by the ESO before the ESO has actually gotten up and running. Here's the relevant code:
// install Pinot
const pinot = cluster.addHelmChart('Pinot', {
chartAsset: new Asset(this, 'PinotChartAsset', { path: path.join(__dirname, '../pinot') }),
release: 'pinot',
namespace: 'pinot',
createNamespace: true
});
// install the External Secrets Operator
const externalSecretsOperator = cluster.addHelmChart('ExternalSecretsOperator', {
chart: 'external-secrets',
release: 'external-secrets',
repository: 'https://charts.external-secrets.io',
namespace: 'external-secrets',
createNamespace: true,
values: {
installCRDs: true,
webhook: {
port: 9443
}
}
});
// create a Fargate Profile
const fargateProfile = cluster.addFargateProfile('FargateProfile', {
fargateProfileName: 'externalsecrets',
selectors: [{ 'namespace': 'external-secrets' }]
});
// create the Service Account used by the Secret Store
const serviceAccount = cluster.addServiceAccount('ServiceAccount', {
name: 'eso-service-account',
namespace: 'external-secrets'
});
serviceAccount.addToPrincipalPolicy(new iam.PolicyStatement({
effect: iam.Effect.ALLOW,
actions: [
'secretsmanager:GetSecretValue',
'secretsmanager:DescribeSecret'
],
resources: [
'arn:aws:secretsmanager:us-east-1:<MY-ACCOUNT-ID>:secret:*'
]
}))
serviceAccount.node.addDependency(externalSecretsOperator);
// create the Secret Store, an ESO Resource
const secretStoreManifest = getSecretStoreManifest(serviceAccount);
const secretStore = cluster.addManifest('SecretStore', secretStoreManifest);
secretStore.node.addDependency(serviceAccount);
secretStore.node.addDependency(fargateProfile);
// create the External Secret, another ESO resource
const externalSecretManifest = getExternalSecretManifest(secretStoreManifest)
const externalSecret = cluster.addManifest('ExternalSecret', externalSecretManifest)
externalSecret.node.addDependency(secretStore);
externalSecret.node.addDependency(pinot);
Even though I've set the ESO as a dependency to the Secret Store, when I try to deploy this I get the following error:
Received response status [FAILED] from custom resource. Message returned: Error: b'Error from server (InternalError): error when creating "/tmp/manifest.yaml": Internal error occurred:
failed calling webhook "validate.clustersecretstore.external-secrets.io": Post "https://external-secrets-webhook.external-secrets.svc:443/validate-external-secrets-io-v1beta1-clusterse
cretstore?timeout=5s": no endpoints available for service "external-secrets-webhook"\n'
If I understand correctly, this is the error you'd get if you try to add a Secret Store before the ESO is fully installed. I'm guessing that CDK does not wait until the ESO's pods are running before attempting to apply the manifest. Furthermore, if I comment out the lines the create the Secret Store and External Secret, do a cdk deploy, uncomment those lines and then deploy again, everything works fine.
Is there any way around this? Some way I can retry applying the manifest, or to wait a period of time before attempting the apply?
The addHelmChart method has a property wait that is set to false by default - setting it to true lets CDK know to not mark the installation as complete until of its its K8s resources are in a ready state.

CannotPullContainerError: failed to extract layer

I'm trying to run a task on a windows container in fargate mode on aws
The container is a .net console application (Fullframework 4.5)
This is the task definition generated programmatically by SDK
var taskResponse = await ecsClient.RegisterTaskDefinitionAsync(new Amazon.ECS.Model.RegisterTaskDefinitionRequest()
{
RequiresCompatibilities = new List<string>() { "FARGATE" },
TaskRoleArn = TASK_ROLE_ARN,
ExecutionRoleArn = EXECUTION_ROLE_ARN,
Cpu = CONTAINER_CPU.ToString(),
Memory = CONTAINER_MEMORY.ToString(),
NetworkMode = NetworkMode.Awsvpc,
Family = "netfullframework45consoleapp-task-definition",
EphemeralStorage = new EphemeralStorage() { SizeInGiB = EPHEMERAL_STORAGE_SIZE_GIB },
ContainerDefinitions = new List<Amazon.ECS.Model.ContainerDefinition>()
{
new Amazon.ECS.Model.ContainerDefinition()
{
Name = "netfullframework45consoleapp-task-definition",
Image = "XXXXXXXXXX.dkr.ecr.eu-west-1.amazonaws.com/netfullframework45consoleapp:latest",
Cpu = CONTAINER_CPU,
Memory = CONTAINER_MEMORY,
Essential = true
//I REMOVED THE LOG DEFINITION TO SIMPLIFY THE PROBLEM
//,
//LogConfiguration = new Amazon.ECS.Model.LogConfiguration()
//{
// LogDriver = LogDriver.Awslogs,
// Options = new Dictionary<string, string>()
// {
// { "awslogs-create-group", "true"},
// { "awslogs-group", $"/ecs/{TASK_DEFINITION_NAME}" },
// { "awslogs-region", AWS_REGION },
// { "awslogs-stream-prefix", $"{TASK_DEFINITION_NAME}" }
// }
//}
}
}
});
these are the role policies contained used by the task AmazonECSTaskExecutionRolePolicy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "*"
}
]
}
i got this error when lunch the task
CannotPullContainerError: ref pull has been retried 1 time(s): failed to extract layer sha256:fe48cee89971abac42eedb9110b61867659df00fc5b0b90dd91d6e19f704d935: link /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/212/fs/Files/ProgramData/Microsoft/Event Viewer/Views/ServerRoles/RemoteDesktop.Events.xml /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/212/fs/Files/Windows/Microsoft.NET/assembly/GAC_64/Microsoft.Windows.ServerManager.RDSPlugin/v4.0_10.0.0.0__31bf3856ad364e35/RemoteDesktop.Events.xml: no such file or directory: unknown
some search drived me here:
https://aws.amazon.com/it/premiumsupport/knowledge-center/ecs-pull-container-api-error-ecr/
the point 1 says that if i run the task on the private subnet (like i'm doing) i need a NAT with related route to garantee the communication towards the ECR, but
note that in my infrastructure i've a VPC Endpoint to the ECR....
so the first question is: is a VPC Endpoint sufficent to garantee the comunication from the container to the container images registry(ECR)? or i need necessarily to implement what the point 1 say (NAT and route on the route table) or eventually run the task on a public subnet?
Can be the error related to the missing communication towards the ECR, or could be a missing policy problem?
Make sure your VPC endpoint is configured correctly. Note that
"Amazon ECS tasks hosted on Fargate using platform version 1.4.0 or later require both the com.amazonaws.region.ecr.dkr and com.amazonaws.region.ecr.api Amazon ECR VPC endpoints as well as the Amazon S3 gateway endpoint to take advantage of this feature."
See https://docs.aws.amazon.com/AmazonECR/latest/userguide/vpc-endpoints.html for more information
In the first paragraph of the page I linked: "You don't need an internet gateway, a NAT device, or a virtual private gateway."

aws-ecs-patterns error: Cluster for this service needs Ec2 capacity. Call addXxxCapacity() on the cluster

Hoping someone can help me here, according to AWS CDK documentation if I declare my VPC then I shouldn't declare 'capacity', but when I run cdk synth I get the following error...
throw new Error(Validation failed with the following errors:\n ${errorList});
Error: Validation failed with the following errors:
[PrerenderInfrasctutureStack/preRenderApp/Service] Cluster for this service needs Ec2 capacity. Call addXxxCapacity() on the cluster.
here is my code...
(i hope Nathan Peck sees this)
const ec2 = require('#aws-cdk/aws-ec2');
const ecsPattern = require('#aws-cdk/aws-ecs-patterns');
const ecs = require('#aws-cdk/aws-ecs');
class PrerenderInfrasctutureStack extends cdk.Stack {
/**
*
* #param {cdk.Construct} scope
* #param {string} id
* #param {cdk.StackProps=} props
*/
constructor(scope, id, props) {
super(scope, id, props);
const myVPC = ec2.Vpc.fromLookup(this, 'publicVpc', {
vpcId:'vpc-xxx'
});
const preRenderApp = new ecsPattern.ApplicationLoadBalancedEc2Service(this, 'preRenderApp', {
vpcId: myVPC,
certificate: 'arn:aws:acm:ap-southeast-2:xxx:certificate/xxx', //becuase this is spcified, then the LB will automatically use HTTPS
domainName: 'my-dev.com.au.',
domainZone:'my-dev.com.au',
listenerPort: 443,
publicLoadBalancer: true,
memoryReservationMiB: 8,
cpu: 4096,
desiredCount: 1,
taskImageOptions:{
image: ecs.ContainerImage.fromRegistry('xxx.dkr.ecr.region.amazonaws.com/express-prerender-server'),
containerPort: 3000
},
});
}
}
module.exports = { PrerenderInfrasctutureStack }
This is because if you don't explicitly pass a cluster then it uses the default cluster that exists on your account. However the default cluster starts out with no EC2 capacity, since EC2 instances cost money when they run. You can use the empty default cluster with Fargate mode since Fargate does not require EC2 capacity, it just runs your container inside Fargate, but the default cluster won't work with EC2 mode until you add EC2 instances to the cluster.
The easy solution here is to switch to ApplicationLoadBalancedFargateService instead, because Fargate services run using Fargate capacity, so they don't require EC2 instances in the cluster. Alternatively you should define your own cluster using something like:
// Create an ECS cluster
const cluster = new ecs.Cluster(this, 'Cluster', {
vpc,
});
// Add capacity to it
cluster.addCapacity('DefaultAutoScalingGroupCapacity', {
instanceType: new ec2.InstanceType("t2.xlarge"),
desiredCapacity: 3,
});
Then pass that cluster as a property when creating the ApplicationLoadBalancedEc2Service
Hope this helps!

How to not rebuild a DockerImageAsset at every deploy using aws-cdk in TypeScript?

My app is a Python API that I package as a Docker image and use with ECS Fargate (Spot Instances). The code below works.
My issue is that it rebuilds the entire image every time I deploy this – which is very time-consuming (downloads all dependencies, makes the image, uploads, etc). I want it to reuse the exact same image uploaded to ECR by aws-cdk itself.
Is there a way (env variable or else) for me to skip this when I don't touch the app's code and just make changes to the stack?
#!/usr/bin/env node
import * as cdk from "#aws-cdk/core"
import * as ecs from "#aws-cdk/aws-ecs"
import * as ec2 from "#aws-cdk/aws-ec2"
import * as ecrassets from "#aws-cdk/aws-ecr-assets"
// See https://docs.aws.amazon.com/cdk/api/latest/docs/aws-ecs-readme.html
export class Stack extends cdk.Stack {
constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props)
/**
* Repository & Image
*/
const apiDockerImage = new ecrassets.DockerImageAsset(
this,
`my-api-image`,
{
directory: `.`,
exclude: [`cdk.out`, `cdk`, `.git`]
}
)
/**
* Cluster
*/
const myCluster = new ecs.Cluster(this, "Cluster", {})
// Add Spot Capacity to the Cluster
myCluster.addCapacity(`spot-auto-scaling-group-capacity`, {
maxCapacity: 2,
minCapacity: 1,
instanceType: new ec2.InstanceType(`r5a.large`),
spotPrice: `0.0400`,
spotInstanceDraining: true
})
// A task Definition describes what a single copy of a task should look like
const myApiFargateTaskDefinition = new ecs.FargateTaskDefinition(
this,
`api-fargate-task-definition`,
{
cpu: 2048,
memoryLimitMiB: 8192,
}
)
// Add image to task def
myApiFargateTaskDefinition.addContainer(`api-container`, {
image: ecs.ContainerImage.fromEcrRepository(
apiDockerImage.repository,
`latest`
),
})
// And the service attaching the task def to the cluster
const myApiService = new ecs.FargateService(
this,
`my-api-fargate-service`,
{
cluster: myCluster,
taskDefinition: myApiFargateTaskDefinition,
desiredCount: 1,
assignPublicIp: true,
}
)
}
}
The proper solution is to build your image outside of this deployment process and just get a reference to that image in ECR.

cloudify custom workflow missing cloudify_agent runtime information

I want to develop my own workflow named "backup" in cloudify with my own plugin, but when i ran that workflow, the below error occured
'backup' workflow execution failed: RuntimeError: Workflow failed: Task failed 'script_runner.tasks.run' -> Missing cloudify_agent runtime information. This most likely means that the Compute node never started successfully
I don't understand why, anybody can solved me this problem?
Here is my main blueprint code and plugin code
My main blueprint
tosca_definitions_version: cloudify_dsl_1_2
imports:
- plugins/backup.yaml
- types/types.yaml
node_templates:
mynode:
type: cloudify.nodes.Compute
properties:
ip: "ip"
agent_config:
install_method: none
user: "user"
key: "key_uri"
myapp:
type: cloudify.nodes.ApplicationModule
interfaces:
test_platform_backup:
backup:
implementation: scripts/backup.sh
inputs:
port: 6969
post_backup:
implementation: scripts/post_backup.sh
relationships:
- type: cloudify.relationships.contained_in
target: mynode
My plugin code:
from cloudify.decorators import workflow
from cloudify.workflows import ctx
from cloudify.workflows.tasks_graph import forkjoin
#workflow
def backup(operation, type_name, operation_kwargs, is_node_operation, **kwargs):
graph = ctx.graph_mode()
send_event_starting_tasks = {}
send_event_done_tasks = {}
for node in ctx.nodes:
if type_name in node.type_hierarchy:
for instance in node.instances:
send_event_starting_tasks[instance.id] = instance.send_event('Starting to run operation')
send_event_done_tasks[instance.id] = instance.send_event('Done running operation')
for node in ctx.nodes:
if type_name in node.type_hierarchy:
for instance in node.instances:
sequence = graph.sequence()
if is_node_operation:
operation_task = instance.execute_operation(operation, kwargs=operation_kwargs)
else:
forkjoin_tasks = []
for relationship in instance.relationships:
forkjoin_tasks.append(relationship.execute_source_operation(operation))
forkjoin_tasks.append(relationship.execute_target_operation(operation))
operation_task = forkjoin(*forkjoin_tasks)
sequence.add(
send_event_starting_tasks[instance.id],
operation_task,
send_event_done_tasks[instance.id])
for node in ctx.nodes:
for instance in node.instances:
for rel in instance.relationships:
instance_starting_task = send_event_starting_tasks.get(instance.id)
target_done_task = send_event_done_tasks.get(rel.target_id)
if instance_starting_task and target_done_task:
graph.add_dependency(instance_starting_task, target_done_task)
return graph.execute()
It seems that your VM did not start.
From your code I can't understand what you are trying to do.
You don't install and agent and you don't have a fabric connection to the VM, yet you are trying to run operations on the VM.
You should either install an agent, E.g remove the "install_method: none", or add a fabric connection to the VM and run the operations with it.