Trigger a PHP script to run on each Fargate task in an ECS service - amazon-ecs

What is the best way to trigger a PHP script on all running Fargate tasks for an ECS service?
I need to trigger this from another PHP script on one of the ECS tasks.
The reason I need to do this is that I have NGINX FastCGI Cache on these individual ECS tasks and need to purge the cache for all tasks when an admin makes an update in the CMS.
My NGINX configuration has a /purge/[path] endpoint that will purge the cache using fastcgi_cache_purge, and I do currently have a somewhat working hacky solution that looks something like this:
Admin saves in the CMS
PHP snippet runs that:
a) Fetches the number of running ECS tasks using AWS PHP SDK
b) Runs a for loop for the number of ECS tasks running
c) Calls the /purge/[path] URL using cURL (tries multiple times as it sometimes hits the same ECS task)
The above works but is not optimal.
Here are some other solutions that come to mind, but I can't find much information online on how to implement:
Would it be possible perhaps to change the fastcgi_cache_path to a shared file system like AWS EFS, or does that hurt performance not being tmpfs? I see that there's tmpfs support for ECS, but once mounted, would it be shared across the multiple ECS tasks or individually per ECS task?
Using AWS SNS/SQS (write one PHP script that subscribes and another one that publishes an event)
Use Redis pub/sub similar to above (also not sure exactly how to implement and how to start a long-running subscriber when ECS tasks start)
Use ECS Exec and AWS PHP SDK (almost have a working solution that fetches all ECS tasks and loops them through, and executes an "ECS Exec" command, but "--non-interactive" is not available yet, so it doesn't work. Only "--interactive" mode works currently)
Is there an easier/better solution for this? If using any of the above implementations, can someone put me in the right direction on how to implement this using PHP?
Thanks!

This honestly seems like a duplicate of your previous question, but I'll answer:
Would it be possible perhaps to change the fastcgi_cache_path to a shared file system like AWS EFS, or does that hurt performance not
being tmpfs? I see that there's tmpfs support for ECS, but once
mounted, would it be shared across the multiple ECS tasks or
individually per ECS task?
EFS wouldn't be a good option for this because EFS is SLOW. It would kill any performance benefit of having a cache.
A tmpfs on Fargate will not be shared among instances.
Using AWS SNS/SQS (write one PHP script that subscribes and another one that publishes an event)
SNS could work, as discussed in your previous question. SQS would not work at all, since an SQS message is only delivered to one consumer, and you want it to be delivered to all instances.
Use Redis pub/sub similar to above (also not sure exactly how to implement and how to start a long-running subscriber when ECS tasks
start)
Yes this would work, as discussed in your previous question. But of course would require a lot of custom coding.
Use ECS Exec and AWS PHP SDK (almost have a working solution that fetches all ECS tasks and loops them through, and executes an "ECS
Exec" command, but "--non-interactive" is not available yet, so it
doesn't work. Only "--interactive" mode works currently)
This should work. Why do you state that --non-interactive is not available yet? If you just leave off the --interactive parameter, then the command is executed non-interactively.

Related

GKE and Task Queues

I am working on a cloud service platform that consists of getting tasks from users, executing them, and giving back the results.
TL;DR
Is there a way to have a "task queue", where tasks can be inserted via a REST API, and extracted automatically by the Google Kubernetes Engine cluster by guaranteeing an automatic scaling?
Long description
Users can send tasks in parallel, and each task is time consuming and need to be performed on a GPU. So, setting up an auto-scaling GPU cluster is what I thought of.
More in particular, in my idea, users could send tasks/data through a REST API, the REST API provides in filling a task queue, and the task queue itself will feed tasks to workers on the GPU auto-scaling cluster. Of course, there are other details (authentication, database, storage, etc.) that have to be addressed but are not the point of my question.
For reasons I don't specify here, the project is already started on the Google Cloud Platform, so switching to AWS or other providers is not an option.
For what I understood, things seem a bit different from standard Docker-only clusters in AWS, that is, we have to use the Google Kubernetes Engine (GKE) to setup the auto-scaling cluster, even for "simple" GPU-enabled Docker containers.
By looking at the not-so-exhaustive documentation, I know that queues are used, but what I don't know is whether feeding of tasks to the cluster is automatically handled. Also, the so-called "Task Queue" service has been deprecated.
Thank you!
First I thought Cloud Tasks queues may be the answer to your troubles, but more this post seems to promote Cloud Pub/Sub as a better alternative.
After a quick chat with batch developers, the current solution (before the batch service become public) is to adopt a third-party queue system like Slurm.

Run task defintion after stack creation

The question seems simple enough. I have a bunch task definitions and a cluster in my CloudFormation template. When setting up manually I would create a task based on any definition and provide it with a CRON definition. It would then start to run.
I can't seem to find this option in CF? I found service but this only works for tasks that run indefinitely, which mine are not (they run once per day for approx. 10-20 minutes).
After some research I found out about AWS::Events::Rule which people seem to only use in conjunction with Lambda which I do not. I was unable to find any example that referenced FARGATE tasks so I'm not sure it's even possible.
If anyone has any examples of running tasks in CRON using CF, that would be great.
I think that ECS scheduled tasks (cron) would suit you:
Amazon ECS supports the ability to schedule tasks on either a cron-like schedule or in a response to CloudWatch Events. This is supported for Amazon ECS tasks using both the Fargate and EC2 launch types.
This is based on CloudWatch Events which can be used to schedule many things, not only lambda.
To setup it using CloudFormation you can use AWS::Events::Rule with the target of AWS::Events::Rule EcsParameters

Best way to deploy long-running high-compute app to GCP

I have a python app that builds a dataset for a machine learning task on GCP.
Currently I have to start an instance of a VM that we have, and then SSH in, and run the app, which will complete in 2-24 hours depending on the size of the dataset requested.
Once the dataset is complete the VM needs to be shutdown so we don't incur additional charges.
I am looking to streamline this process as much as possible, so that we have a "1 click" or "1 command" solution, but I'm not sure the best way to go about it.
From what I've read about so far it seems like containers might be a good way to go, but I'm inexperienced with docker.
Can I setup a container that will pip install the latest app from our private GitHub and execute the dataset build before shutting down? How would I pass information to the container such as where to get the config file etc? It's conceivable that we will have multiple datasets being generated at the same time based on different config files.
Is there a better gcloud feature that suits our purpose more effectively than containers?
I'm struggling to get information regarding these basic questions, it seems like container tutorials are dominated by web apps.
It would be useful to have a batch-like container service that runs a container until its process completes. I'm unsure whether such a service exists. I'm most familiar with Google Cloud Platform and this provides a wealth of compute and container services. However -- to your point -- these predominantly scale by (HTTP) requests.
One possibility may be Cloud Run and to trigger jobs using Cloud Pub/Sub. I see there's async capabilities too and this may be interesting (I've not explored).
Another runtime for you to consider is Kubernetes itself. While Kubernetes requires some overhead in having Google, AWS or Azure manage a cluster for you (I strongly recommend you don't run Kubernetes yourself) and some inertia in the capacity of the cluster's nodes vs. the needs of your jobs, as you scale the number of jobs, you will smooth these needs. A big advantage with Kubernetes is that it will scale (nodes|pods) as you need them. You tell Kubernetes to run X container jobs, it does it (and cleans-up) without much additional management on your part.
I'm biased and approach the container vs image question mostly from a perspective of defaulting to container-first. In this case, you'd receive several benefits from containerizing your solution:
reproducible: the same image is more probable to produce the same results
deployability: container run vs. manage OS, app stack, test for consistency etc.
maintainable: smaller image representing your app, less work to maintain it
One (beneficial!?) workflow change if you choose to use containers is that you will need to build your images before using them. Something like Knative combines these steps but, I'd stick with doing-this-yourself initially. A common solution is to trigger builds (Docker, GitHub Actions, Cloud Build) from your source code repo. Commonly you would run tests against the images that are built but you may also run your machine-learning tasks this way too.
Your containers would container only your code. When you build your container images, you would pip install, perhaps pip install --requirement requirements.txt to pull the appropriate packages. Your data (models?) are better kept separate from your code when this makes sense. When your runtime platform runs containers for you, you provide configuration information (environment variables and|or flags) to the container.
The use of a startup script seems to better fit the bill compared to containers. The instance always executes startup scripts as root, thus you can do anything you like, as the command will be executed as root.
A startup script will perform automated tasks every time your instance boots up. Startup scripts can perform many actions, such as installing software, performing updates, turning on services, and any other tasks defined in the script.
Keep in mind that a startup script cannot stop an instance but you can stop an instance through the guest operating system.
This would be the ideal solution for the question you posed. This would require you to make a small change in your Python app where the Operating system shuts off when the dataset is complete.
Q1) Can I setup a container that will pip install the latest app from our private GitHub and execute the dataset build before shutting down?
A1) Medium has a great article on installing a package from a private git repo inside a container. You can execute the dataset build before shutting down.
Q2) How would I pass information to the container such as where to get the config file etc?
A2) You can use ENV to set an environment variable. These will be available within the container.
You may consider looking into Docker for more information about container.

AWS Fargate vs Batch vs ECS for a once a day batch process

I have a batch process, written in PHP and embedded in a Docker container. Basically, it loads data from several webservices, do some computation on data (during ~1h), and post computed data to an other webservice, then the container exit (with a return code of 0 if OK, 1 if failure somewhere on the process). During the process, some logs are written on STDOUT or STDERR. The batch must be triggered once a day.
I was wondering what is the best AWS service to use to schedule, execute, and monitor my batch process :
at the very begining, I used a EC2 machine with a crontab : no high-availibilty function here, so I decided to switch to a more PaaS approach.
then, I was using Elastic Beanstalk for Docker, with a non-functional Webserver (only to reply to the Healthcheck), and a Crontab inside the container to wake-up my batch command once a day. With autoscalling rule min=1 max=1, I have HA (if the container crash or if the VM crash, it is restarted by AWS)
but now, to be more efficient, I decided to move to some ECS service, and have an approach where I do not need to have EC2 instances awake 23/24 for nothing. So I tried Fargate.
with Fargate I defined my task (Fargate type, not the EC2 type), and configure everything on it.
I create a Cluster to run my task : I can run "by hand, one time" my task, so I know every settings are corrects.
Now, going deeper in Fargate, I want to have my task executed once a day.
It seems to work fine when I used the Scheduled Task feature of ECS : the container start on time, the process run, then the container stop. But CloudWatch is missing some metrics : CPUReservation and CPUUtilization are not reported. Also, there is no way to know if the batch quit with exit code 0 or 1 (all execution stopped with status "STOPPED"). So i Cant send a CloudWatch alarm if the container execution failed.
I use the "Services" feature of Fargate, but it cant handle a batch process, because the container is started every time it stops. This is normal, because the container do not have any daemon. There is no way to schedule a service. I want my container to be active only when it needs to work (once a day during at max 1h). But the missing metrics are correctly reported in CloudWatch.
Here are my questions : what are the best suitable AWS managed services to trigger a container once a day, let it run to do its task, and have reporting facility to track execution (CPU usage, batch duration), including alarm (SNS) when task failed ?
We had the same issue with identifying failed jobs. I propose you take a look into AWS Batch where logs for FAILED jobs are available in CloudWatch Logs; Take a look here.
One more thing you should consider is total cost of ownership of whatever solution you choose eventually. Fargate, in this regard, is quite expensive.
may be too late for your projects but still I thought it could benefit others.
Have you had a look at AWS Step Functions? It is possible to define a workflow and start tasks on ECS/Fargate (or jobs on EKS for that matter), wait for the results and raise alarms/send emails...

How to run multiple Kubernetes jobs in sequence?

I would like to run a sequence of Kubernetes jobs one after another. It's okay if they are run on different nodes, but it's important that each one run to completion before the next one starts. Is there anything built into Kubernetes to facilitate this? Other architecture recommendations also welcome!
This requirement to add control flow, even if it's a simple sequential flow, is outside the scope of Kubernetes native entities as far as I know.
There are many workflow engine implementations for Kubernetes, most of them are focusing on solving CI/CD but are generic enough for you to use however you want.
Argo: https://applatix.com/open-source/argo/
Added a custom resource deginition in Kubernetes entity for Workflow
Brigade: https://brigade.sh/
Takes a more serverless like approach and is built on Javascript which is very flexible
Codefresh: https://codefresh.io
Has a unique approach where you can use the SaaS to easily get started without complicated installation and maintenance, and you can point Codefresh at your Kubernetes nodes to run the workflow on.
Feel free to Google for "Kubernetes Workflow", and discover the right platform for yourself.
Disclaimer: I work at Codefresh
I would try to use cronjobs and set the concurrency policy to forbid so it doesn't run concurrent jobs.
I have worked on IBM TWS (Workload Automation) which is a scheduler similar to cronjob where you can mention the dependencies of the jobs.
You can specify a job to run only after it's dependencies has run using follows keyword.