I am trying to restart my ECS service. Whenever I issue the below command, it takes 5 minutes for the ECS to re-start service.
aws ecs update-service --service name --force-new-deployment
Note that I mean, ECS does that only after 5 mins. 5 mins wait time seems to be 'draining' phase. However my application does not even process a single request. So why is it waiting for 5 minutes?
How can I forcefully restart immediately via commandline?
The command aws ecs update-service --service name --force-new-deployment execute a simple process:
Start a new task, when the health check marks as healthy the service begins to drain the old task and starts the load balancer to divert the connections towards the new task. For this reason it takes some minutes.
If you want to stop immediately you need to use aws ecs stop-task Check here about it and then start the new task, use aws ecs start-task Check here about it or run-task Check here about it. But you will have a downtime.
Related
I have a cluster running two services, a web app, and a message queue handler. The app has autoscaling configured, but the queue handler does not. There's also a couple of scheduled tasks that run as well.
If I run a task manually using the cli (aws ecs run-task), the task works properly at first, then then after about 5 minutes loses the ability to make outbound connections.
From looking at the scaling logs, it doesn't look like autoscaling would be causing this issue.
I have an ECS cluster (fargate), task, and service I have had setup in Terraform for at least a year. I haven't touched it for a long while. My normal deployment for updating the code is to push a new container to the registry and then stop all tasks on the cluster with a script. Today, my service did not run a new task in response to that task being stopped. It's desired count is fixed at so it should.
I have go in an tried to manually run this and I'm seeing this error.
Unable to run task
Http request timed out enforced after 4999ms
When I try to do this, a new stopped task is added to my stopped tasks lists. When I look into that task the stopped reason is "Deployment restart" and two of them are now showing "Task provisioning failed." which I think might be tasks the service tried to start. But these tasks do not show a started timestamp. The ones I start in the console have a started timestamp.
My site is now down and I can't get it back up. Does anyone know of a way to debug this? Is AWS ECS experiencing problems right now? I checked the health monitors and I see no issues.
This was an AWS outage affecting Fargate in us-east-1. It's fixed now.
I've created a simple task to print a hello world. I've created a ECR image, docker compose and ecs-params.yml.
I get the cloudwatch log for the print, but the task keeps launching every minute, which I guess it's due to REPLICA service type.
How can I stop this from happening, I want to launch this Fargate task ONLY from a lambda, and when it finishes I don't it to be relaunched.
Thanks in advance
If you want a one-shot / one-off / standalone task to be launched by ECS and have it run until it finishes, you wouldn't use an ECS service definition but merely a task.
You can run tasks on their own without packaging as an ECS service.
See: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_run_task.html
If you are using the ECS CLI, then there is also ecs-cli compose create. So, you would use that call and not the one also creating an ECS service along with it.
You can then use AWS Lambda and send an ecs:RunTask AWS API call to invoke/start the ECS task.
I have a batch process, written in PHP and embedded in a Docker container. Basically, it loads data from several webservices, do some computation on data (during ~1h), and post computed data to an other webservice, then the container exit (with a return code of 0 if OK, 1 if failure somewhere on the process). During the process, some logs are written on STDOUT or STDERR. The batch must be triggered once a day.
I was wondering what is the best AWS service to use to schedule, execute, and monitor my batch process :
at the very begining, I used a EC2 machine with a crontab : no high-availibilty function here, so I decided to switch to a more PaaS approach.
then, I was using Elastic Beanstalk for Docker, with a non-functional Webserver (only to reply to the Healthcheck), and a Crontab inside the container to wake-up my batch command once a day. With autoscalling rule min=1 max=1, I have HA (if the container crash or if the VM crash, it is restarted by AWS)
but now, to be more efficient, I decided to move to some ECS service, and have an approach where I do not need to have EC2 instances awake 23/24 for nothing. So I tried Fargate.
with Fargate I defined my task (Fargate type, not the EC2 type), and configure everything on it.
I create a Cluster to run my task : I can run "by hand, one time" my task, so I know every settings are corrects.
Now, going deeper in Fargate, I want to have my task executed once a day.
It seems to work fine when I used the Scheduled Task feature of ECS : the container start on time, the process run, then the container stop. But CloudWatch is missing some metrics : CPUReservation and CPUUtilization are not reported. Also, there is no way to know if the batch quit with exit code 0 or 1 (all execution stopped with status "STOPPED"). So i Cant send a CloudWatch alarm if the container execution failed.
I use the "Services" feature of Fargate, but it cant handle a batch process, because the container is started every time it stops. This is normal, because the container do not have any daemon. There is no way to schedule a service. I want my container to be active only when it needs to work (once a day during at max 1h). But the missing metrics are correctly reported in CloudWatch.
Here are my questions : what are the best suitable AWS managed services to trigger a container once a day, let it run to do its task, and have reporting facility to track execution (CPU usage, batch duration), including alarm (SNS) when task failed ?
We had the same issue with identifying failed jobs. I propose you take a look into AWS Batch where logs for FAILED jobs are available in CloudWatch Logs; Take a look here.
One more thing you should consider is total cost of ownership of whatever solution you choose eventually. Fargate, in this regard, is quite expensive.
may be too late for your projects but still I thought it could benefit others.
Have you had a look at AWS Step Functions? It is possible to define a workflow and start tasks on ECS/Fargate (or jobs on EKS for that matter), wait for the results and raise alarms/send emails...
For EC2 launch type I'm able to check agent configuration in /etc/ecs/ecs.config file at EC2 container instance. But is it possible to find out the same info at ECS Fargate Task? For example, I'd like to know, what is the timeout between SIGTERM and SIGKILL (ECS_CONTAINER_STOP_TIMEOUT). I wonder should it be possible to retrieve such info from Amazon ECS Task Metadata Endpoint?
In Fargate, timeout between SIGTERM and SIGKILL is the same as the default setting of 30 seconds.
For newer Fargate platform versions, you can use the stopTimeout container definition parameter. Note the maximum value of 120 seconds:
For tasks that use the Fargate launch type, the task or service requires platform version 1.3.0 or later (Linux) or 1.0.0 or later (for Windows). The max stop timeout value is 120 seconds. However, if the parameter isn't specified, the default value of 30 seconds is used.