I am attempting to build a job queue using two redis master servers in two EC2 availability zones. All LPUSH operations are done in the application layer to both master machines in both AZs. Ideally I would be using GitHub's resque, but resque does not seem to have any notion of multiple masters in multiple AZs.
I need to ensure only one worker is working on a given job. Some workers will be in AZ 1A talking to the redis machine in 1A, and some will be in AZ 1B talking to the machine in 1B. I need to avoid the scenario where a worker in 1A and a worker in 1B both deque the same job from different redis masters and try to work on it simultaneously.
Does this worker pseudocode have any race conditions that I may have missed?
job_id = master1.BRPOPLPUSH "queue", "working"
m1lock = master1.SETNX "lock.#{job_id}"
m2lock = master2.SETNX "lock.#{job_id}"
completed = master1.ZSCORE "completed", job_id
if completed
# must have been completed just now on other server, no-op
master1.LREM "working", 0, job_id
master1.del "lock.#{job_id}"
master2.del "lock.#{job_id}"
elsif not m1lock or not m2lock
# other server is working on it? We will put back at the end of our queue
master1.LPUSH "queue", job_id
master1.LREM "working", 0, job_id
master1.del "lock.#{job_id}" if m1lock
master2.del "lock.#{job_id}" if m2lock
else
# have a lock, it's not complete, so do work
do_work(job_id)
now = Time.now.to_i
master1.ZADD "completed", now, job_id
master2.ZADD "completed", now, job_id
master1.del "lock.#{job_id}"
master2.del "lock.#{job_id}"
master1.LREM "working", 0, job_id
master2.LREM "queue", 0, job_id # not strictly necessary b/c of "completed"
end
what you are trying to do in essence is master-master replication, whether it's a queue or anything else, redis doesn't support it, and your pseudo code has race conditions.
just doing:
m1lock = master1.SETNX "lock.#{job_id}"
m2lock = master2.SETNX "lock.#{job_id}"
means another worker can take the job while you are doing this, and two workers will work on it at once.
I don't think redis is ideal for your pattern, and I don't know any queue server that can work that way, but then again, I don't know many such servers, so I'm sure there is.
If you load balance your work so that only one master gets a job at once, it is possible, but then you have two queues in essence, not one.
I'm curious... if you're already in the AWS environment, why wouldn't you choose instead to use Amazon's SQS service? I've worked with it in the past and realize it's a bit of a pain in the ass, but it's Amazon's most mature service and it's purpose built for this scenario.
Related
Are there any existing out of the box job queue framework? basic idea is
someone to enqueue a job with job status New
(multiple) workers get a job and work on it, mark the job as Taken. One job can only be running on at most one worker
something will monitor the worker status, if the running jobs exceed predefined timeout, will be re-queued with status New, could be worker health issue
Once a worker completes a task, it marks the task as Completed in the queue.
something keeps cleaning up completed tasks. Or at step #4 when worker completes a task, the worker simply dequeues the task.
From my investigation, things like Kafka (pub/sub) or MQ (push/pull & pub/sub) or cache (Redis, Memcached) are mostly sufficient for this work. However, they all require some sort of development around its core functionality to become a fully functional job queue.
Also looked into relational DB, the ones supports "SELECT * FOR UPDATE SKIP LOCKED" syntax is also a good candidate, this again requires a daemon between the DB and worker, which means extra effort.
Also looked into the cloud solutions, Azure Queue storage, etc. similar assessment.
So my question is, is there any out of the box solution for job queue, that are tailored and dedicated for one thing, job queuing, without much effort to set up?
Thanks
Take a look at Python Celery. https://docs.celeryproject.org/en/stable/getting-started/introduction.html
The default mode uses RabbitMQ as the message broker, but other options are available. Results can be stored in a DB if needed.
I have several instances of "orchestrator" microservice that runs on different nodes and executes Spring Batch jobs. Only one instance has to be "active" and conduct the job at a time. The jobs are scheduled twice a day via #Scheduled annotation with cron expression.
So, mocriservice tries to execute jobs with a single identifying JobParameter that is a LocalDateTime.now() truncated to seconds to compensate time difference between OpenShift nodes my instances run on.
Underlying DB is Postgres 12, which transaction isolation level is set to repeatable read.
The problem seems imossible to me, but it happens and reproduces always. Job execution fails on each microservice instance with DuplicateKeyException on composite PK, which is (not suprisingly) job name and identifying parameter's hash.
The question is how is it possible and what am I missing? Any ideas?
Sorry for such a late answer. There were no problem at all, locks work correctly regardless transaction isolation level. We have two OpenShift clusters - active and inactive. Jobs were running on "inactive" nodes that are called so just because no client traffic routed to them. As it turned out, production support had no access to "inactive" nodes logs :)
I’m finally dipping my toes in the kubernetes pool and wanted to get some advice on the best way to approach a problem I have:
Tech we are using:
GCP
GKE
GCP Pub/Sub
We need to do bursts of batch processing spread out across a fleet and have decided on the following approach:
New raw data flows in
A node analyses this and breaks the data up into manageable portions which are pushed onto a queue
We have a cluster with Autoscaling On and Min Size ‘0’
A Kubernetes job spins up a pod for each new message on this cluster
When pods can’t pull anymore messages they terminate successfully
The question is:
What is the standard approach for triggering jobs such as this?
Do you create a new job each time or are jobs meant to be long lived and re-run?
I have only seen examples of using a yaml file however we would probably want the node which did the portioning of work to create the job as it knows how many parallel pods should be run. Would it be recommended to use the python sdk to create the job spec programatically? Or if jobs are long lived would you simply hit the k8 api and modify the parallel pods required then re-run job?
Jobs in Kubernetes are meant to be short-lived and are not designed to be reused. Jobs are designed for run-once, run-to-completion workloads. Typically they are be assigned a specific task, i.e. to process a single queue item.
However, if you want to process multiple items in a work queue with a single instance then it is generally advisable to instead use a Deployment to scale a pool of workers that continue to process items in the queue, scaling the number of pool workers dependent on the number of items in the queue. If there are no work items remaining then you can scale the deployment to 0 replicas, scaling back up when there is work to be done.
To create and control your workloads in Kubernetes the best-practice would be to use the Kubernetes SDK. While you can generate YAML files and shell out to another tool like kubectl using the SDK simplifies configuration and error handling, as well as allowing for simplified introspection of resources in the cluster as well.
I have a question about Rundeck features. Is it possible to include conditions within job execution? As it is quite difficult to explain, I provide an example:
You have 2 redundant firewalls in your network. You implement a job 'job1' and it's aim is to update your firewall's configuration. Master is down, therefore you do not want to update slave. Indeed if you do so, slave will have to restart and there will not have any firewall running for a short time. So, what I want to do is to test, before running the update, that none of my firewalls are out of service. If the master is down, then do not update slave.
So, is it possible to involve multiple nodes within one job?
Thanks for helping!
You create a job which pings both the firewalls. If both are up then this job will succeed. Now create another job which includes this job before update job in workflow. Make this job proceed only if first workflow succeeds. That should solve your problem.
I have a condor cluster with multiple nodes active.
But when I submit a job, it only runs on a single node (i.e Master node). I'm aware that Condor automatically distributes job based on available resources.
But what if I want to force condor to make use of all the nodes? Just for the sake of evaluating process time when running on multiple nodes vs single node?
I have tried adding requirements = Machine == "hostname1" && Machine == "hostname2" in the submit file, but isn't working.
Depending on what you're trying to do, you might want to use the parallel universe as outlined here: http://research.cs.wisc.edu/htcondor/manual/current/2_9Parallel_Applications.html
With a parallel universe job you indicate the machine count via machine_count and only need to queue a single task.
I am afraid that I not fully understanding what you are asking. Let's see if I can help somehow. I can see a few scenarios:
Condor is only scheduling your jobs to run on the master node, regardless of how many machines are available.
Condor is scheduling jobs on all available machines. However what you are trying to do is get a particular job to make use of more than one machine.
In case 1. something fishy is going on with either your submit file or your pool setup. I will assume that condor_status returns more than one machine and that your pool setup is OK. The typical gotcha in this case is the following: if you do not specify a Requirement for your job, Condor will insert one for you. By default Condor will request that job runs on a machine that has the same OS and architecture of the submit node. This one did bite me a few times with heterogeneous pools ;-)
In case 2. you will have to make sure that your executable can make use of multiple machines (e.g. by way of MPI) and you need to tell Condor about it. One way to do that is to use the Parallel universe. Another way is to use a classic master/worker architecture where the workers are persistent Condor jobs.
Condor is limited in a way that it can only execute (system()) a command. If your program does not create many subtasks, you will not experience any speed improvement.
Please post a short snippet of your job description (file).