What guarantees a process monitor stays running? - supervisord

It is common when deploying Web applications to use something like supervisord or circus to keep all your processes up and running. But how do you ensure that these programs are running? In other words, who will watch the watchers?

Humans are watchers for tools like supervisor. There are 3rd party plugins like supervisord-monitor which provide UI for monitoring.
But humans cannot monitor 24/7. So supervisor has evenlistners. You can use plugins like superlance to monitor supervisor process. When a process stops/exits, You can configure to receive alerts via email/sms.
Checkout superlance - Superlance is a package of plugin utilities for monitoring and controlling processes that run under supervisor.

Related

How to monitor AZURE VMS with OPEN SOURCE SOFTWARE (NAGIOS/SENSU)

I am looking for a monitoring tool for CPU,MEMORY resources running under AZURE VM and Need Alerts also. We are not looking for paid solutions, but we would like to run a monitoring server. In our cloud, we will launch and remove servicesI wonder what tool is best to monitor hosts and services when they will be added/removed in a daily basis?
I am considering Ganglia, Nagios, Icinga and Sensu. Any other not paid option is welcome too as long as it can monitor the described scenario.

How to configure Celery to send email alerts when tasks fail?

How is it possible to configure celery to send email alerts when tasks are failing?
For example I want Celery to notify me when more than 3 tasks fail or more than 10 tasks are being retried.
Is it possible using celery or a utility (e.g. flower) or I have to write my own plugin?
Yes, all you need to do is set CELERY_SEND_TASK_ERROR_EMAILS = True and if Celery process fails django will send message with traceback to all emails set in ADMINS settings.
As far as I know, it's not possible out of the box.
You could write custom client on top of celery or flower or directly accessing RabbitMQ.
What I would do (and I am doing) is simply logging failed tasks and then use something like Graylog2 to monitor the log files, this works for all your infrastructure, not just Celery.
You can also use something like NewRelic which monitors your processes directly and offers many other features. Although email reporting on exceptions is somewhat limited in NewRelic.
A simple client/monitor probably is the quickest solution.

Open Source Job Scheduler with REST API

Are there any open source Job Scheduler with REST API for commercial use which will support features like:
Tree like Job dependency
Hold & Release
Rerun failed steps
Parallelism
Help would be appreciated :)
NOTE: we are looking for open source alternative for TWS,Control-M,AutoSys.
JobScheduler would seem to meet your requirements:
Open Source see: Open Source and Commercial Licenses
Rest API see: Web Service Integration
Parallelism see: Organisation of Jobs and Job Chains
I think that these areas are also covered (I downloaded and trialled the application): See here
Tree like Job dependency
Hold & Release
Rerun failed steps
I'm not affiliated with SOS GmbH
ProActive Scheduler is an open source job scheduler.
It is part of OW2 organization
It is written in Java so it comes with a Java and a REST API
It provides workflows that are set of tasks with dependencies and more (loop,replicate, branch), upon failures you can control if the task should be cancelled or restarted
Parallelism and distribution is at the heart of it, with features like for instance
Commercial Support is provided by Activeeon, the company behind ProActive (full disclosure: I work for Activeeon).
You might be interested in DKron
Dkron is a system service that runs scheduled jobs at given intervals or times, just like the cron unix service but distributed in several machines in a cluster. If a machine fails (the leader), a follower will take over and keep running the scheduled jobs without human intervention. Dkron is Open Source and freely available.
http://dkron.io/
While not open source, a more cost effective job scheduler solution with REST API and support for the features listed is ActiveBatch workload automation and job scheduling. I do work for the company (being up-front) but our customers love how they can easily extend their automated processes to connect to any application, any service, any server with our REST API adapter. You can get more information here: https://www.advsyscon.com/en-us/activebatch/rest-api-adapter

LoadRunner suggested approach to monitoring cloud based JBoss infrastructure

My Project has a technical platform consisting of a cloud-based set up with JBoss nodes running on Linux VMs and databases connected to these further below.
Obviously I can configure each JBoss instance to accept Remote monitoring via JMX and use VisualVM to monitor them. But as the number of JBoss (combined app server and web app server) increases the monitoring gets out-of-hand as there is a lot of nodes to monitor. I have been thinking about using our JBoss Operations Networking (JON) and maybe monitor on this abstraction level, but is there a way to configure LoadRunner to monitor i.e. through JON?
General question:
Does anybody have experience in monitoring a could based JBoss infrastructure through LoadRunner or do you monitor through i.e. JON instead when running the LoadTest?
All Monitoring in SiteScope
Base operating system Monitors through SiteScope
JMX Monitoring in SiteScope
(Alternate route) SNMP Agents for JBOSS and your OS, through SiteScope
When you go to run the test, connect to your SiteScope instance from your LR/Performance Center controller and pull in the SiteScope Stats. As an alternative to SiteScope Business Availability Center can also be used.

Queuing systems - what is a good way to start up multiple workers?

How have you set-up one or more worker scripts for queue-oriented systems?
How do you arrange to startup - and restart if necessary - worker scripts as required? (I'm thinking about such tools as init.d/, Ruby-based 'god', DJB's Daemontools, etc, etc)
I'm developing an asynchronous queue/worker system, in this case using PHP & BeanstalkdD (though the actual language and daemon isn't important). The tasks themselves are not too hard - encoding an array with the commands and parameters into JSON for transport through the Beanstalkd daemon, picking them up in a worker script to action them as required.
There are a number of other similar queue/worker setups out there, such as Starling, Gearman, Amazon's SQS and other more 'enterprise' oriented systems like IBM's MQ and RabbitMQ. If you run something like Gearman, or SQS - how do you start and control the worker pool? The questions is on the initial worker startup, and then being able to add additional extra workers, shutting them down at will (though I can send a message through the queue to shut them down - as long as some 'watcher' won't automatically restart them). This is not a PHP problem, it's about straight Unix processes of setting up one or more processes to run on startup, or adding more workers to the pool.
A bash script to loop a script is already in place - this calls the PHP script which then collects and runs tasks from the queue, occasionally exiting to be able to clean itself up (it can also pause a few seconds on failure, or via a planned event). This works fine, and building the worker processes on top of that won't be very hard at all.
Getting a good worker controller system is about flexibility, starting one or two automatically on a machine start, and being able to add a couple more from the command line when the queue is busy, shutting down the extras when no longer required.
I've been helping a friend who's working on a project that involves a Gearman-based queue that will dispatch various asynchronous jobs to various PHP and C daemons on a pool of several servers.
The workers have been designed to behave just like classic unix/linux daemons, thanks to simple shell scripts in /etc/init.d/, and commands like :
invoke-rc.d myWorker start|stop|restart|reload
This mechanism is simple and efficient. And as it relies on standard linux features, even people with a limited knowledge of your app can launch a daemon or stop one, if they know how it's called system-wise (aka "myWorker" in the above example).
Another advantage of this mechanism is it makes your workers pool management easy as well. You could have 10 daemons on your machine (myWorker1, myWorker2, ...) and have a "worker manager" start or stop them depending on the queue length. And as these commands can be run through ssh, you can easily manage several servers.
This solution may sound cheap, but if you build it with well-coded daemons and reliable management scripts, I don't see why it would be less efficient than big-bucks solutions, for any average (as in "non critical") project.
Real message queuing middleware like WebSphere MQ or MSMQ offer "triggers" where a service that is part of the MQM will start a worker when new messages are placed into a queue.
AFAIK, no "web service" queuing system can do that, by the nature of the beast. However I have only looked hard at SQS. There you have to poll the queue, and in Amazon's case overly eager polling is going to cost you some real $$.
I've recently been working on such a tool. It's not entirely finished (thought it should take more than a few more days before I hit something I could call 1.0) and clearly not ready for production yet, but the important part are already coded. Anybody can have a look at the code here: https://gitorious.org/workers_pool.
Supervisor is a good monitor tool. It includes a web UI where you can monitor and manage workers.
Here is a simple config file for a worker.
[program:demo]
command=php worker.php ; php command to run worker file
numprocs=2 ; number of processes
process_name=%(program_name)s_%(process_num)03d ; unique name for each process if numprocs > 1
directory=/var/www/demo/ ; directory containing worker file
stdout_logfile=/var/www/demo/worker.log ; log file location
autostart=true ; auto start program when supervisor starts
autorestart=true ; auto restart program if it exits