Amazon Fargate - determine container startup order - amazon-ecs

Is there a way in Amazon Fargate to determine the startup order of containers?
Let's say I have two containers and I want container B to start only when A is already up.
In other Amazon ECS I could use links to achieve this, but links are not supported in Fargate's awsvpc network mode.

As of March 2019, this feature is now available! https://aws.amazon.com/about-aws/whats-new/2019/03/amazon-ecs-introduces-enhanced-container-dependency-management/
The way you do this is with dependsOn in your task definition. e.g. if you want a container to start only after a container foo has started, you do
"dependsOn": [
{
"containerName": "foo",
"condition": "START"
}
]

If you are using the AWS Task Definition wizard, you can configure this under the section "Container Definitions". Edit (or add) your container, then scroll down to the section "STARTUP DEPENDENCY ORDERING". You can choose from four options: start, complete, success, healthy.

There is no explicit way to control ordering in Fargate today, but it's on the roadmap.

Related

Why does ECS integration new-relic task require AmazonEC2ContainerServiceforEC2Role?

We are trying to use the AWS Cloudformation way of installing the ECS integration for our clusers with NewRelic as described in this link
I observed that this cloud formation first creates few IAM roles for Task that will be executed as daemon service and one of the roles on the Task created is AmazonEC2ContainerServiceforEC2Role , which includes permissions to operated with Container Instances, including Deregistering the Container Instance.
I am interested to understand under what circumstances will this daemon task required to Deregister instance or for that matter Create cluster or register instance. The complete list of permissions given by IAM are as below. Can someone please elaborated why would we need this in first place.
Tried putting this in newrelic discussion forums but havent had any luck yet
"ec2:DescribeTags", "ecs:CreateCluster", "ecs:DeregisterContainerInstance", "ecs:DiscoverPollEndpoint", "ecs:Poll", "ecs:RegisterContainerInstance", "ecs:StartTelemetrySession", "ecs:UpdateContainerInstancesState", "ecs:Submit*", "ecr:GetAuthorizationToken", "ecr:BatchCheckLayerAvailability", "ecr:GetDownloadUrlForLayer", "ecr:BatchGetImage", "logs:CreateLogStream", "logs:PutLogEvents"

How to reduce downtime caused by pulling images in the Kubernetes Recreate deployment strategy

Assuming I have a Kubernetes Deployment object with the Recreate strategy and I update the Deployment with a new container image version. Kubernetes will:
scale down/kill the existing Pods of the Deployment,
create the new Pods,
which will pull the new container images
so the new containers can finally run.
Of course, the Recreate strategy is exepected to cause a downtime between steps 1 and 4, where no Pod is actually running. However, step 3 can take a lot of time if the container images in question are or the container registry connection is slow, or both. In a test setup (Azure Kubernetes Services pulling a Windows container image from Docker Hub), I see it taking 5 minutes and more, which makes for a really long downtime.
So, what is a good option to reduce that downtime? Can I somehow get Kubernetes to pull the new images before killing the Pods in step 1 above? (Note that the solution should work with Windows containers, which are notoriously large, in case that is relevant.)
On the Internet, I have found this Codefresh article using a DaemonSet and Docker in Docker, but I guess Docker in Docker is no longer compatible with containerd.
I've also found this StackOverflow answer that suggests using an Azure Container Registry with Project Teleport, but that is in private preview and doesn't support Windows containers yet. Also, it's specific to Azure Kubernetes Services, and I'm looking for a more general solution.
Surely, this is a common problem that has a "standard" answer?
Update 2021-12-21: Because I've got a corresponding answer, I'll clarify that I cannot easily change the deployment strategy. The application in question does not support running Pods of different versions at the same time because it uses a database that needs to be migrated to the corresponding application version, without forwards or backwards compatibility.
Implement a "blue-green" deployment strategy. For instance, the service might be running and active in the "blue" state. A new deployment is created with a new container image, which deploys the "green" pods with the new container image. When all of the "green" pods are ready, the "switch live" step is run, which switches the active color. Very little downtime.
Obviously, this has tradeoffs. Your cluster will need more memory to run the additional transitional pods. The deployment process will be more complex.
Via https://www.reddit.com/r/kubernetes/comments/oeruh9/can_kubernetes_prepull_and_cache_images/, I've found these ideas:
Implement a DaemonSet that runs a "sleep" loop on all the images I need.
Use http://github.com/mattmoor/warm-image, which has no Windows support.
Use https://github.com/ContainerSolutions/ImageWolf, which says, "ImageWolf is currently alpha software and intended as a PoC - please don't run it in production!"
Use https://github.com/uber/kraken, which seems to be a registry, not a pre-pulling solution.
Use https://github.com/dragonflyoss/Dragonfly (now https://github.com/dragonflyoss/Dragonfly2), which also seems to do somethings completely different.
Use https://github.com/senthilrch/kube-fledged, which looks exactly right and more mature than the others, but has no Windows support.
Use https://github.com/dcherman/image-cache-daemon, which has no Windows support.
Use https://goharbor.io/blog/harbor-2.1/, which also seems to be a registry, not a pre-pulling solution.
Use https://openkruise.io/docs/user-manuals/imagepulljob/, which also looks right, but a) OpenKruise is huge and I'm not sure I want to install this just to preload images, and b) it seems it has no Windows support.
So, it seems I have to implement this on my own, with a DaemonSet. I still hope someone can provide a better answer than this one 🙂 .

Timeout waiting for network interface provisioning to complete

Does anyone know why an ECS Fargate task would fail with this error?
Timeout waiting for network interface provisioning to complete. I am running an ECS Fargate task using step functions. The IAM role for step function have access to the task def.The state machine code also looks good. The same step function worked fine before but i ran into this error just now. Want to know why this would happen? is it occasional?
According to AWS support, intermittent failures of this nature are to be expected (with relatively low probability).
The recommendation was to set retryAttempts > 1 to handle these situations.
This can happen if there are problems within AWS. You can view the Network Interfaces page on the EC2 console and you may see errors loading, which is an indication of API problems within EC2. You can also check status.aws.amazon.com to look for errors. Note that AWS can be slow to acknowledge problems there, so you may experience the errors before they update the status page!
AWS support has a detailed post on resolving network interface provision errors for ECS on Fargate. Here's an excerpt from the same
If the Fargate service tries to attach an elastic network interface to the underlying infrastructure that the task is meant to run on, then you can receive the following error message: "Timeout waiting for network interface provisioning to complete."
Fargate faces intermittent API issues usually while spinning up in Step functions and AWS Batch jobs. And as recommended in another answer you can update the MaxAttempts for retry in the definition.
"Retry": [
{
"MaxAttempts": 3,
}
]
Additionally, reattempts can be automated with an exponential backoff and retry logic in AWS Step Functions.
I was hitting the same issue until I switched over to fargate platform 1.4.0
It looks like there were some changes made to the networking side of things.
https://aws.amazon.com/blogs/containers/aws-fargate-launches-platform-version-1-4/
The default version is currently still set at 1.3.0 so maybe give that a try and see if it fixes it for you.

Task definitions AWS Fargate

Let us say I am defining a task definition in AWS Fargate, this task definition would be used to start up tasks that involve a multi-container application regarding 2 web servers. How many task definitions would I need, how many tasks would I pay for and how many services are create?
I have read a lot of documentation, but it does not click for me. Is there anyone who can explain the correlation between: task definitions, task/s, Docker containers, services and ECS Fargate clusters?
A task definition is a specification. You use it to define one or more containers (with image URIs) that you want to run together, along with other details such as environment variables, CPU/memory requirements, etc. The task definition doesn't actually run anything, its a description of how things will be set up when something does run.
A task is an actual thing that is running. ECS uses the task definition to run the task; it downloads the container images, configures the runtime environment based on other details in the task definition. You can run one or many tasks for any given task definition. Each running task is a set of one or more running containers - the containers in a task all run on the same instance.
A service in ECS is a way to run N tasks all using the same task definition, and keep those N tasks running if they happen to shut down unexpectedly. Those N tasks can run on different instances in EC2 (although some may run on the same instance depending on the placement strategy used for the service); on Fargate, there are no instances and the tasks "just run", so you don't have to think about placement strategies. You can also use services to connect those tasks to a load balancer, so that requests from a client inside or outside of AWS can be routed evenly cross all N tasks. You can update the task definition used by a service, which will then trigger a rolling update (starting up and shutting down running tasks) so that all running tasks will be using the new version of the task definition after the deployment completes. This is used, for example, when you create a new container image and want your service to be updated to use the latest version.
A service is scoped to a cluster. A cluster is really just a name. Different clusters can have different IAM policies and roles, so that you can restrict who can create services in different clusters using IAM.

How to schedule ECS tasks on AWS Fargate

I have created a Task Definition on Elastic Container Service and have successfully run it in a Fargate cluster. However when I create a Scheduled Task in said cluster the option for "Launch Type" is hardcoded to EC2. Is there a way, perhaps through the command line to schedule the task to run on Fargate?
Heads up ! This is now supported in AWS:
https://aws.amazon.com/about-aws/whats-new/2018/08/aws-fargate-now-supports-time-and-event-based-task-scheduling/
Although not in some regions - As at Apr-19 it still wasn't supported in EU-west-2 (London). Check the table at the top of this page to see if it's supported in the region you want: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/scheduled_tasks.html
There seem to be no way of scheduling a task on FARGATE.
Only way it can be done right now seems to be by having your 'scheduler' external to ECS. I did it with a lambda. You can also use something like a jenkins or a simple cron task that fires the aws-cli command to ECS, in both these cases though you will need an instance always running.
I wrote a lambda that accepts the params (overrides) to be sent to the ECS task and has the schedule the task was supposed to have.
Update:
It seems there is a schedule tab in FARGATE Cluster details now that will allow you set cron schedules on ECS tasks.
While the AWS Documentation gives you ways to do this through CloudFormation, it seems like they've not yet released this feature anyway. I have been trying to do something similar and ran into the same issue.
Once it does become available, this link from the aws docs should be useful. Here's how they suggest doing it, but I keep running into errors saying NetworkConfiguration is not recognized and LaunchType is not recognized.
"EcsParameters": {
"Group": "string",
"LaunchType": "string",
"NetworkConfiguration": {
"awsvpcConfiguration": {
"AssignPublicIp": "string",
"SecurityGroups": [ "string" ],
"Subnets": [ "string" ]
}
},
Update: Here is an alternative that did end up working for me through the aws events put-targets command on the aws cli!
Make sure your aws cli is up to date. This method fails for older
versions of the cli. run this to update: pip install awscli --upgrade --user
After that, you should be good to go. Use the aws events put-targets --rule <value> --targets <value> command. Make sure that before you run this command you have a rule already defined on your cluster. If not, you can do that with the aws events put-rule cmd too. Refer to the AWS docs for put-rule, and for put-targets.
An example of a rule from the documentation is given below:
aws events put-rule --name "DailyLambdaFunction" --schedule-expression "cron(0 9 * * ? *)"
The put-targets command that worked for me is this:
aws events put-targets --rule cli-RS-rule --targets '{"Arn": "arn:aws:ecs:1234/cluster/clustername","EcsParameters": {"LaunchType": "FARGATE","NetworkConfiguration": {"awsvpcConfiguration": {"AssignPublicIp": "ENABLED", "SecurityGroups": [ "sg-id1233" ], "Subnets": [ "subnet-1234" ] }},"TaskCount": 1,"TaskDefinitionArn": "arn:aws:ecs:1234:task-definition/taskdef"},"Id": "sampleID111","RoleArn": "arn:aws:iam:1234:role/eventrole"}'
You can create a CloudWatch rule that uses a schedule as the event source and an ESC task as the target.
No this is not supported yet unfortunately. There is an open issue here. Hopefully it gets done soon as I would like to use it as well!
Disclosure: I work for SenseDeep that provides Powerdown # https://www.powerdown.io
Other services provide this functionality. PowerDown gives the ability to schedule Fargate services. This is at the service level, not the task level, but it is easy to create services for tasks. For example: you could schedule a CICD pipeline container to run 9-5 M-F.
It's not possible to have EC2 instances and Fargate instances at the same cluster.
It's possible to schedule a Fargate instance. Create a specific service and update that from aws tools. Ex:
aws ecs update-service --service my-http-service --task-definition
https://docs.aws.amazon.com/cli/latest/reference/ecs/update-service.html
Useful resources:
You could use the ECS aws tools and execute on lambda or travis.
Check out this medium post:
https://medium.com/#joseignaciocastelli92/how-to-create-a-continuous-deployment-process-using-ecs-fargate-docker-travis-410d84b4d99e
At the button has this repository that has the aws commands:
https://github.com/JicLotus/ecs-farate-scripts-to-deploy-and-build
Bests