How do I schedule one-time tasks from a Perl CGI application? - perl

I am writing an application to allow users to schedule one-time long-running tasks from a web application (Linux/Apache/CGI::Application). To do this I use the Schedule::At module which is the Perl interface to the "at" command. Since the scheduled tasks are not repeating, I am not considering "cron". I have two issues with "at" though:
Scheduling works fine when my CGI application runs under the suexec wrapper, but not when scheduled by the owner of the Apache process. How can I get scheduling to work in both environments (suexec and no-suexec)?
It appears that the processes scheduled by "at" or Schedule::At have no failure reporting, and I sometimes find that scheduled tasks fail silently. Is there some way to log the fact that the scheduled task (not the scheduler itself) has failed to run?
I am not fixed on "at" and am open to using other, more robust, scheduling methods if there are any.
Thank you for your attention.

I've heard good things about The Schwartz . It doesn't have a delay-until though; you'd submit the jobs via at, but that should solve both of the problems you list above, as long as your submit_job script was simple.
(as a caveat, I've only used Gearman, I think you'd want a reliable job queue for this, a "fire and forget" mechanism, so you can keep your submit_job dumb.)

Related

Out of box distributed job queue solution

Are there any existing out of the box job queue framework? basic idea is
someone to enqueue a job with job status New
(multiple) workers get a job and work on it, mark the job as Taken. One job can only be running on at most one worker
something will monitor the worker status, if the running jobs exceed predefined timeout, will be re-queued with status New, could be worker health issue
Once a worker completes a task, it marks the task as Completed in the queue.
something keeps cleaning up completed tasks. Or at step #4 when worker completes a task, the worker simply dequeues the task.
From my investigation, things like Kafka (pub/sub) or MQ (push/pull & pub/sub) or cache (Redis, Memcached) are mostly sufficient for this work. However, they all require some sort of development around its core functionality to become a fully functional job queue.
Also looked into relational DB, the ones supports "SELECT * FOR UPDATE SKIP LOCKED" syntax is also a good candidate, this again requires a daemon between the DB and worker, which means extra effort.
Also looked into the cloud solutions, Azure Queue storage, etc. similar assessment.
So my question is, is there any out of the box solution for job queue, that are tailored and dedicated for one thing, job queuing, without much effort to set up?
Thanks
Take a look at Python Celery. https://docs.celeryproject.org/en/stable/getting-started/introduction.html
The default mode uses RabbitMQ as the message broker, but other options are available. Results can be stored in a DB if needed.

Pause Scheduled tasks in SCDF

Hi I'm running batch jobs via SCDF in openshift environment. All the jobs have been scheduled through the scheduling option in SCDF. Is there way to pause or Hold those jobs from executing instead of destroying the schedules ? Since the number of jobs are more, everytime we have to recreated the schedules for all of them.
Thanks.
We have an open issue: spring-cloud/spring-cloud-dataflow#3276 to add support for it.
Feel free to update the issue with your use-case requirements and the acceptance criteria. Better yet, it'd be great if you can contribute adding support for it in a PR; we would love to collaborate and release it.

How to debug celery delays and errors?

I am continuing Django project of someone who is using Celery along with Mandrill. There are daily reports which are sent to customers and due to some reason not a single mail is sent for three days, gets accumulated and sent together after three days. Since I am new to Celery, I want to know how to debug celery delays and errors, what are popular commands and execution path to follow?
Short tips:
Set debug=True in celery config, it will take you register and execution time for every task.
Install flower, popular tool for monitoring tasks
Use sentry for handy error tracking and aggregation
Happy debugging ;)

PowerShell session/environment isolation - Jobs sharing same context?

I'm testing a workflow runbook that utilizes Add-Type to add some custom C# code.
All of a sudden I started getting 'type already exists' errors on subsequent test jobs, as if a new PSSession is not being created.
In other words, it looks like new jobs are sharing the same execution context. I only get this locally if I try to run the same command twice per PS instance.
The type in question is a static class with some Extension methods. Since it also happens to be the first type declared in the source block, I don't doubt other non-static types would throw errors as well.
I've executed this handfuls of times already, so I fully expect that 'eventually' this will stop happening, but I can't seem to force it, and I have no idea what I could've done to trip it into this situation, either.
Seeing evidence of shared execution contexts across jobs like this - even (especially?) if only temporal - makes me wonder if some or all of the general execution inconsistencies we've seen in the past when making & deploying changes & performing subsequent tests soon-after are related to this.
I'm tempted to think that this is simply a part of the difference between a Test Job and a 'real' one, but that raises questions about the validity of the Test jobs themselves WRT mimicking Published Jobs.
Are all Azure Automation Jobs supposed to execute in Isolation? Can this be controlled/exploited by a developer?
Each automation account has its own isolated sandboxes where its jobs run. Those sandboxes are distributed among a number of worker machines. For test jobs, to try to improve job start time since [make code change, retest] over and over is very common, Automation reuses the same sandbox as used for previous test jobs of this runbook, if the sandbox has not been cleaned up yet, so that sandboxes do not have to be spun up for each unique test job (sandbox creation is one reason for a longer job start time than desired). Due to this behavior, if you execute test jobs of the same runbook within a short amount of time, you will get the behavior you're seeing above.
However, even for production jobs, jobs of the same automation account (across runbooks) can share the same sandboxes. We randomly distribute jobs across our worker machines, so its possible job A is queued for execution and is placed on worker W, then 5 minutes later, job B is queued for execution and is placed on worker W as well. If job A and job B are of the same automation account and have the same "demands" in terms of modules / module versions, they will be placed in the same sandbox, if job A's sandbox is still around. "Module / module version demands" does not mean the modules used by the runbook, but the modules / latest module versions that existed in the automation account at the time when the job was started / runbook was scheduled (for jobs started via schedule) / runbook was assigned to a webhook (for jobs started via webhook)
In terms of resolving your specific problem, you could surround Add-Type with a try, catch statement, or maybe use Add-Type -IgnoreWarnings

Queuing systems - what is a good way to start up multiple workers?

How have you set-up one or more worker scripts for queue-oriented systems?
How do you arrange to startup - and restart if necessary - worker scripts as required? (I'm thinking about such tools as init.d/, Ruby-based 'god', DJB's Daemontools, etc, etc)
I'm developing an asynchronous queue/worker system, in this case using PHP & BeanstalkdD (though the actual language and daemon isn't important). The tasks themselves are not too hard - encoding an array with the commands and parameters into JSON for transport through the Beanstalkd daemon, picking them up in a worker script to action them as required.
There are a number of other similar queue/worker setups out there, such as Starling, Gearman, Amazon's SQS and other more 'enterprise' oriented systems like IBM's MQ and RabbitMQ. If you run something like Gearman, or SQS - how do you start and control the worker pool? The questions is on the initial worker startup, and then being able to add additional extra workers, shutting them down at will (though I can send a message through the queue to shut them down - as long as some 'watcher' won't automatically restart them). This is not a PHP problem, it's about straight Unix processes of setting up one or more processes to run on startup, or adding more workers to the pool.
A bash script to loop a script is already in place - this calls the PHP script which then collects and runs tasks from the queue, occasionally exiting to be able to clean itself up (it can also pause a few seconds on failure, or via a planned event). This works fine, and building the worker processes on top of that won't be very hard at all.
Getting a good worker controller system is about flexibility, starting one or two automatically on a machine start, and being able to add a couple more from the command line when the queue is busy, shutting down the extras when no longer required.
I've been helping a friend who's working on a project that involves a Gearman-based queue that will dispatch various asynchronous jobs to various PHP and C daemons on a pool of several servers.
The workers have been designed to behave just like classic unix/linux daemons, thanks to simple shell scripts in /etc/init.d/, and commands like :
invoke-rc.d myWorker start|stop|restart|reload
This mechanism is simple and efficient. And as it relies on standard linux features, even people with a limited knowledge of your app can launch a daemon or stop one, if they know how it's called system-wise (aka "myWorker" in the above example).
Another advantage of this mechanism is it makes your workers pool management easy as well. You could have 10 daemons on your machine (myWorker1, myWorker2, ...) and have a "worker manager" start or stop them depending on the queue length. And as these commands can be run through ssh, you can easily manage several servers.
This solution may sound cheap, but if you build it with well-coded daemons and reliable management scripts, I don't see why it would be less efficient than big-bucks solutions, for any average (as in "non critical") project.
Real message queuing middleware like WebSphere MQ or MSMQ offer "triggers" where a service that is part of the MQM will start a worker when new messages are placed into a queue.
AFAIK, no "web service" queuing system can do that, by the nature of the beast. However I have only looked hard at SQS. There you have to poll the queue, and in Amazon's case overly eager polling is going to cost you some real $$.
I've recently been working on such a tool. It's not entirely finished (thought it should take more than a few more days before I hit something I could call 1.0) and clearly not ready for production yet, but the important part are already coded. Anybody can have a look at the code here: https://gitorious.org/workers_pool.
Supervisor is a good monitor tool. It includes a web UI where you can monitor and manage workers.
Here is a simple config file for a worker.
[program:demo]
command=php worker.php ; php command to run worker file
numprocs=2 ; number of processes
process_name=%(program_name)s_%(process_num)03d ; unique name for each process if numprocs > 1
directory=/var/www/demo/ ; directory containing worker file
stdout_logfile=/var/www/demo/worker.log ; log file location
autostart=true ; auto start program when supervisor starts
autorestart=true ; auto restart program if it exits