How to load balance jobs using spring batch when different nodes has different times? - quartz-scheduler

We have so many batch jobs to handle.
Now problem is we have 7 different nodes which has same application deployed(We use JBoss AS 7.1.1. as a application server) and We use Spring batch using quartz scheduler to schedule jobs.And it works just fine.
But 1 of our nodes is diff time then others (e.g. Suppose we have 3 nodes A,B,C so when there's a 12:00:00 in C there's a 11:58:00 in A and B) and all these nodes are been maintained by client.
So when any trigger fires(we use cron trigger) job run on single node only.
Now specific time(take 12:00) we need to fire more than one job, then all of them runs on a single node as all of them were timed out earlier the other nodes(As 12:00 o'clock happened in C before A and B).
I was wondering do we have any such mechanism where we take reference of any centralized time to time out all batch processes(like do not time out batch process when there's 12 O'clock on C but run batch job when there's a 12 O'clock in DB)..?
Thanks in advance :).

Spring Batch provides facilities to launch jobs via messages in the spring-batch-integration module. I'd recommend managing the scheduling from a central point and having it send messages to the servers to be picked up based on the server's availability to run the job. This would also address the issue of time synchronization as the scheduling piece would be handled in a central point.

Ask your client to synchronize servers using NTP. All of your servers should have same time PERIOD. You will have bunch of other problems if you allow your servers stay out of synch with each other.

Related

Spring Batch can not obtain job lock via DB (postgres)

I have several instances of "orchestrator" microservice that runs on different nodes and executes Spring Batch jobs. Only one instance has to be "active" and conduct the job at a time. The jobs are scheduled twice a day via #Scheduled annotation with cron expression.
So, mocriservice tries to execute jobs with a single identifying JobParameter that is a LocalDateTime.now() truncated to seconds to compensate time difference between OpenShift nodes my instances run on.
Underlying DB is Postgres 12, which transaction isolation level is set to repeatable read.
The problem seems imossible to me, but it happens and reproduces always. Job execution fails on each microservice instance with DuplicateKeyException on composite PK, which is (not suprisingly) job name and identifying parameter's hash.
The question is how is it possible and what am I missing? Any ideas?
Sorry for such a late answer. There were no problem at all, locks work correctly regardless transaction isolation level. We have two OpenShift clusters - active and inactive. Jobs were running on "inactive" nodes that are called so just because no client traffic routed to them. As it turned out, production support had no access to "inactive" nodes logs :)

Chronos + Mesosphere. How to execute tasks in parallel?

Good day everyone.
I have single server for Chronos, Mesos and Zookeeper, and i want to use Chronos as something, what will run my scripts daily. Some scripts today, some tomorrow and so on..
The problem is when i'm trying to launch tasks one after another, only first one executes correctly, another one is lost somewhere. If i launch first then take a pause of 3-4 seconds and launch another - they both are launched, but sequentially.
And i need to run them in parallel.
Can someone provide a hint on this? Maybe there is some settings that i must change?
You should set a time in UTC time for both tasks to be launched with a repeating period of 24 hours. In this case, there is no reason why your tasks should not execute in parallel. Check the chronos logs and the tasks logs in sandbox on mesos for errors.
You can certainly run all of these components (Chronos, master, slave, and ZK) on the same machine, although ZK really becomes valuable once you have HA with multiple masters.
As user4103259 suggested, check the master and slave logs for that LOST/failed taskId to see what exactly happened to it. A task could go LOST/failed for numerous reasons, anywhere along the task launch/running/completing process.

Quartz Scheduler using database

I am using Quartz to schedule cron jobs in my web application. i am using a oracle Databse to store jobs and related info. When i add the jobs in the Database, i need to re-start the server/application (tomcat server) for these new jobs to get scheduled. How can i add jobs in the database and make them work without restarting the server.
I assume you mean you are using JDBCJobStore? In that case it is not ideal to make direct changes in the database tables storing the job data. However, I suppose you could set up a separate job that runs every X minutes / hours, checks whether there are new jobs in the database (that need to be scheduled), and schedule them as usual.
Add jobs via the Scheduler API.
http://www.quartz-scheduler.org/docs/best_practices.html

Websphere 7 clustered deployment

We have a J2EE application as EAR file which is deployed in WAS 7, for making the application availability as high it needs to be deployed in 3 clusters. We have a Quartz Scheduler class whose job is to upload data from one database to another daily at 2:00 am.
Now, the problem is if the ear will be deployed in 3 different nodes for load balancing and high availability, all the 3 ear file will trigger the upload at the same time. How we can handle this. Is it possible to do some configuration in WAS 7 environment. Any help/suggestion would be appreciated.
Thanks
You have two possibilities:
The Quartz database backend where all your nodes would connect to the same database that Quartz uses for synchronizing the task running. This can be configured to prevent the task from running on several nodes simultaneously.
EJB 3.x timer. See for instance this example. This however works for ensuring that only member from each of the clusters fire the timer.

Queuing systems - what is a good way to start up multiple workers?

How have you set-up one or more worker scripts for queue-oriented systems?
How do you arrange to startup - and restart if necessary - worker scripts as required? (I'm thinking about such tools as init.d/, Ruby-based 'god', DJB's Daemontools, etc, etc)
I'm developing an asynchronous queue/worker system, in this case using PHP & BeanstalkdD (though the actual language and daemon isn't important). The tasks themselves are not too hard - encoding an array with the commands and parameters into JSON for transport through the Beanstalkd daemon, picking them up in a worker script to action them as required.
There are a number of other similar queue/worker setups out there, such as Starling, Gearman, Amazon's SQS and other more 'enterprise' oriented systems like IBM's MQ and RabbitMQ. If you run something like Gearman, or SQS - how do you start and control the worker pool? The questions is on the initial worker startup, and then being able to add additional extra workers, shutting them down at will (though I can send a message through the queue to shut them down - as long as some 'watcher' won't automatically restart them). This is not a PHP problem, it's about straight Unix processes of setting up one or more processes to run on startup, or adding more workers to the pool.
A bash script to loop a script is already in place - this calls the PHP script which then collects and runs tasks from the queue, occasionally exiting to be able to clean itself up (it can also pause a few seconds on failure, or via a planned event). This works fine, and building the worker processes on top of that won't be very hard at all.
Getting a good worker controller system is about flexibility, starting one or two automatically on a machine start, and being able to add a couple more from the command line when the queue is busy, shutting down the extras when no longer required.
I've been helping a friend who's working on a project that involves a Gearman-based queue that will dispatch various asynchronous jobs to various PHP and C daemons on a pool of several servers.
The workers have been designed to behave just like classic unix/linux daemons, thanks to simple shell scripts in /etc/init.d/, and commands like :
invoke-rc.d myWorker start|stop|restart|reload
This mechanism is simple and efficient. And as it relies on standard linux features, even people with a limited knowledge of your app can launch a daemon or stop one, if they know how it's called system-wise (aka "myWorker" in the above example).
Another advantage of this mechanism is it makes your workers pool management easy as well. You could have 10 daemons on your machine (myWorker1, myWorker2, ...) and have a "worker manager" start or stop them depending on the queue length. And as these commands can be run through ssh, you can easily manage several servers.
This solution may sound cheap, but if you build it with well-coded daemons and reliable management scripts, I don't see why it would be less efficient than big-bucks solutions, for any average (as in "non critical") project.
Real message queuing middleware like WebSphere MQ or MSMQ offer "triggers" where a service that is part of the MQM will start a worker when new messages are placed into a queue.
AFAIK, no "web service" queuing system can do that, by the nature of the beast. However I have only looked hard at SQS. There you have to poll the queue, and in Amazon's case overly eager polling is going to cost you some real $$.
I've recently been working on such a tool. It's not entirely finished (thought it should take more than a few more days before I hit something I could call 1.0) and clearly not ready for production yet, but the important part are already coded. Anybody can have a look at the code here: https://gitorious.org/workers_pool.
Supervisor is a good monitor tool. It includes a web UI where you can monitor and manage workers.
Here is a simple config file for a worker.
[program:demo]
command=php worker.php ; php command to run worker file
numprocs=2 ; number of processes
process_name=%(program_name)s_%(process_num)03d ; unique name for each process if numprocs > 1
directory=/var/www/demo/ ; directory containing worker file
stdout_logfile=/var/www/demo/worker.log ; log file location
autostart=true ; auto start program when supervisor starts
autorestart=true ; auto restart program if it exits