Fargate: how to stop task after job done? - amazon-ecs

I need to calculate the task on my Fargate cluster and after finishing calculating the task should be stopped or terminated to decrease payments.
The consequence of my actions:
One task always running on EC2 Cluster and checking DB to the new data.
If new data appears, Boto3 runs the Fargate container.
After the job is done, the task should be stopped.
Also if in the DB appears the second row of data during proceeding the first task, Fargate should create a second task for the second job. and then stop tasks...
So, I have:
Task written on Python and deployed on ECR
Fargate cluster
Task definition (describe Memory, CPU's and container)
How task know, that it should be closed? Any idea or solution to stop task after job done?

Related

Airflow kubernetes architecture understanding

I'm trying to understand the Arch of Airflow on Kubernetes.
Using the helm and Kubernetes executor, the installation mounts 3 pods called: Trigger, WebServer, and Scheduler...
When I run a dag using the Kubernetes pod operator, it also mounts 2 pods more: one with the dag name and another one with the task name...
I want to understand the communication between pods... So far I know the only the expressed in the image:
Note: I'm using the git sync option
Thanks in advance for the help that you can give me!!
Airflow application has some components that require for it to operate normally: Webserver, Database, Scheduler, Trigger, Worker(s), Executor. You can read about it here.
Lets go over the options:
Kubernetes Executor (As you choose):
In your instance since you are deploying on Kubernetes with Kubernetes Executor then each task being executed is a pod. Airflow wraps the task with a Pod no matter what task it is. This brings to the front the isolation that Kubernetes offer, this also bring the overhead of creating a pod for each task. Choosing Kubernetes Executor often goes with case where many/most of your tasks takes long time to execute - as if your tasks takes 5 seconds to complete it might not be worth to pay the overhead of creating pod for each task. As for what you see as the DAG -> Task1 in your diagram. Consider that the Scheduler launches the Airflow workers. The workers are starting the tasks in new pods. So the worker needs to monitor the execution of the task.
Celery Executor - Setting up a Worker/Pod which tasks can run in it. This gives you speed as there is no need to create pod for each task but there is no isolation for each task. Noting that using this executor doesn't mean that you can't run tasks in their own Pod. User can run KubernetesPodOperator and it will create a Pod for the task.
CeleryKubernetes Executor - Enjoying both worlds. You decide which tasks will be executed by Celery and which by Kubernetes. So for example you can set small short tasks to Celery and longer tasks to Kubernetes.
How will it look like Pod wise?
Kubernetes Executor - Every task creates a pod. PythonOperator, BashOperator - all of them will be wrapped with pods (user doesn't need to change anything on his DAG code).
Celery Executor - Every task will be executed in a Celery worker(pod). So the pod is always in Running waiting to get tasks. You can create a dedicated pod for a task if you will explicitly use KubernetesPodOperator.
CeleryKubernetes - Combining both of the above.
Note again that you can use each one of these executors with Kubernetes environment. Keep in mind that all of these are just executors. Airflow has other components like mentioned earlier so it's very OK to deploy Airflow on Kubernetes (Scheduler, Webserver) but to use CeleryExecutor thus the user code (tasks) are not creating new pods automatically.
As for Triggers since you asked about it specifically - It's a feature added in Airflow 2.2: Deferrable Operators & Triggers it allows tasks to defer and release worker slot.

Auto-Scaling removes running task in ECS service (FARGATE)

I am running an ecs service using Fargate on AWS. Each task there completes a single operation and dies (fetching a message from a SQS queue and decode/encode a video file). Now I designed an autoscaling policy like below,
If SQS queue size is more than 5 increment desired count to 1 (repeat every 60 seconds).
If SQS queue size is less than 2 decrement desired count to 1 (repeat every 60 seconds).
But what AWS is doing is than when queue size gets down below 2, it kills out running tasks leaving the corresponding operation "broken". I don't want AWS to kill the running tasks (because they will automatically die within sometime when the command completes) but just to set the desired count to 0 so that the tasks doesn't get "respawned". So literally I want my tasks to be unstoppable during auto-scaling.
How I can achieve this in ECS service and aws_ecs_autoscaling_target. Please note that I am using terraform to provision the service.
Thanks in advance.
I had to solve this issue in a different approach. I had to create a small Lambda function which gets triggered by the cloudwatch alarm and starts a Fargate task using StartTask. This workflow suited well here rather than using autoscaling policy.

AWS Fargate vs Batch vs ECS for a once a day batch process

I have a batch process, written in PHP and embedded in a Docker container. Basically, it loads data from several webservices, do some computation on data (during ~1h), and post computed data to an other webservice, then the container exit (with a return code of 0 if OK, 1 if failure somewhere on the process). During the process, some logs are written on STDOUT or STDERR. The batch must be triggered once a day.
I was wondering what is the best AWS service to use to schedule, execute, and monitor my batch process :
at the very begining, I used a EC2 machine with a crontab : no high-availibilty function here, so I decided to switch to a more PaaS approach.
then, I was using Elastic Beanstalk for Docker, with a non-functional Webserver (only to reply to the Healthcheck), and a Crontab inside the container to wake-up my batch command once a day. With autoscalling rule min=1 max=1, I have HA (if the container crash or if the VM crash, it is restarted by AWS)
but now, to be more efficient, I decided to move to some ECS service, and have an approach where I do not need to have EC2 instances awake 23/24 for nothing. So I tried Fargate.
with Fargate I defined my task (Fargate type, not the EC2 type), and configure everything on it.
I create a Cluster to run my task : I can run "by hand, one time" my task, so I know every settings are corrects.
Now, going deeper in Fargate, I want to have my task executed once a day.
It seems to work fine when I used the Scheduled Task feature of ECS : the container start on time, the process run, then the container stop. But CloudWatch is missing some metrics : CPUReservation and CPUUtilization are not reported. Also, there is no way to know if the batch quit with exit code 0 or 1 (all execution stopped with status "STOPPED"). So i Cant send a CloudWatch alarm if the container execution failed.
I use the "Services" feature of Fargate, but it cant handle a batch process, because the container is started every time it stops. This is normal, because the container do not have any daemon. There is no way to schedule a service. I want my container to be active only when it needs to work (once a day during at max 1h). But the missing metrics are correctly reported in CloudWatch.
Here are my questions : what are the best suitable AWS managed services to trigger a container once a day, let it run to do its task, and have reporting facility to track execution (CPU usage, batch duration), including alarm (SNS) when task failed ?
We had the same issue with identifying failed jobs. I propose you take a look into AWS Batch where logs for FAILED jobs are available in CloudWatch Logs; Take a look here.
One more thing you should consider is total cost of ownership of whatever solution you choose eventually. Fargate, in this regard, is quite expensive.
may be too late for your projects but still I thought it could benefit others.
Have you had a look at AWS Step Functions? It is possible to define a workflow and start tasks on ECS/Fargate (or jobs on EKS for that matter), wait for the results and raise alarms/send emails...

How to integrate spring-xd batch jobs with Control-M scheduler

I'm trying to solve integration between Control-M scheduler and batch jobs running within spring-xd.
In our existing environment, Control-M agents run on the host and batch jobs are triggered via bash script from Control-M.
In the spring-xd architecture a batch job is pushed out into the XD container cluster and will run on an available container. This means however I don't know what XD container the job will run on. I could pin it to a single container with a deployment manifest, but that goes against the whole point of the cluster.
One potential solution.
Run a VM outside the XD container cluster with the Control-M agent and trigger jobs through the XD API via a bash script. The script would need to wait for the job to complete, by either polling for the job completion via the XD API or wait for an event to signal the completion.
Thinking further ahead this could be a solution to triggering batch jobs deployed in PCF.
In a previous life, I had the enterprise scheduler use Perl scripts to interact with the old Spring Batch Admin REST API to start jobs and poll for completion.
So, yes, the same technique should work fine with XD.
You can also tap into the job events.

Jenkins trigger job by another which are running on offline node

Is there any way to do the following:
I have 2 jobs. One job on offline node has to trigger the second one. Are there any plugins in Jenkins that can do this. I know that TeamCity has a way of achieving this, but I think that Jenkins is more constrictive
When you configure your node, you can set Availability to Take this slave on-line when in demand and off-line when idle.
Set Usage as Leave this machine for tied jobs only
Finally, configure the job to be executed only on that node.
This way, when the job goes to queue and cannot execute (because the node is offline), Jenkins will try to bring this node online. After the job is finished, the node will go back to offline.
This of course relies on the fact that Jenkins is configured to be able to start this node.
One instance will always be turn on, on which the main job can be run. And have created the job which will look in DB and if in the DB no running instances, it will prepare one node. And the third job after running tests will clean up my environment.