How to force condor to submit job to all nodes in the cluster? - distributed-computing

I have a condor cluster with multiple nodes active.
But when I submit a job, it only runs on a single node (i.e Master node). I'm aware that Condor automatically distributes job based on available resources.
But what if I want to force condor to make use of all the nodes? Just for the sake of evaluating process time when running on multiple nodes vs single node?
I have tried adding requirements = Machine == "hostname1" && Machine == "hostname2" in the submit file, but isn't working.

Depending on what you're trying to do, you might want to use the parallel universe as outlined here: http://research.cs.wisc.edu/htcondor/manual/current/2_9Parallel_Applications.html
With a parallel universe job you indicate the machine count via machine_count and only need to queue a single task.

I am afraid that I not fully understanding what you are asking. Let's see if I can help somehow. I can see a few scenarios:
Condor is only scheduling your jobs to run on the master node, regardless of how many machines are available.
Condor is scheduling jobs on all available machines. However what you are trying to do is get a particular job to make use of more than one machine.
In case 1. something fishy is going on with either your submit file or your pool setup. I will assume that condor_status returns more than one machine and that your pool setup is OK. The typical gotcha in this case is the following: if you do not specify a Requirement for your job, Condor will insert one for you. By default Condor will request that job runs on a machine that has the same OS and architecture of the submit node. This one did bite me a few times with heterogeneous pools ;-)
In case 2. you will have to make sure that your executable can make use of multiple machines (e.g. by way of MPI) and you need to tell Condor about it. One way to do that is to use the Parallel universe. Another way is to use a classic master/worker architecture where the workers are persistent Condor jobs.

Condor is limited in a way that it can only execute (system()) a command. If your program does not create many subtasks, you will not experience any speed improvement.
Please post a short snippet of your job description (file).

Related

Does assigning more nodes to a job on a SLURM server increase available RAM?

I am working with a program that needs a lot RAM. Currently I am running it on a SLURM cluster. Each node has 125GB RAM. When submitting the job to a single node it eventually fails as it runs out of memory. My rather naive question, as I am new to working on servers, is:
Does assigning more nodes with the command --nodes flag increase available RAM for the submitted job?
For example:
When assigning 10 nodes instead of 1, with the command below, the program fails at the same point as with with one node.
#SBATCH --nodes=10
Is there some other way to combine RAM from multiple nodes for a single job?
Any and all advice is welcome!
That depends on your program, but most likely no.
To use multiple nodes on a Slurm Cluster (or any cluster, for that matter), your program needs to be set up in very specific way, ie. you need inter node communictaion. This is usually done via MPI and the whole program has to be designed around it.
So if your program uses MPI it may be able to split the workload over several nodes. And even that does not guarantee lower memory as that is usually not the goal of such a parallelization.

Is it possible to run a single container Flink cluster in Kubernetes with high-availability, checkpointing, and savepointing?

I am currently running a Flink session cluster (Kubernetes, 1 JobManager, 1 TaskManager, Zookeeper, S3) in which multiple jobs run.
As we are working on adding more jobs, we are looking to improve our deployment and cluster management strategies. We are considering migrating to using job clusters, however there is reservation about the number of containers which will be spawned. One container per job is not an issue, but two containers (1 JM and 1 TM) per job raises concerns about memory consumption. Several of the jobs need high-availability and the ability to use checkpoints and restore from/take savepoints as they aggregate events over a window.
From my reading of the documentation and spending time on Google, I haven't found anything that seems to state whether or not what is being considered is really possible.
Is it possible to do any of these three things:
run both the JobManager and TaskManager as separate processes in the same container and have that serve as the Flink cluster, or
run the JobManager and TaskManager as literally the same process, or
run the job as a standalone JAR with the ability to recover from/take checkpoints and the ability to take a savepoint and restore from that savepoint?
(If anyone has any better ideas, I'm all ears.)
One of the responsibilities of the job manager is to monitor the task manager(s), and initiate restarts when failures have occurred. That works nicely in containerized environments when the JM and TMs are in separate containers; otherwise it seems like you're asking for trouble. Keeping the TMs separate also makes sense if you are ever going to scale up, though that may moot in your case.
What might be workable, though, would be to run the job using a LocalExecutionEnvironment (so that everything is in one process -- this is sometimes called a Flink minicluster). This path strikes me as feasible, if you're willing to work at it, but I can't recommend it. You'll have to somehow keep track of the checkpoints, and arrange for the container to be restarted from a checkpoint when things fail. And there are other things that may not work very well -- see this question for details. The LocalExecutionEnvironment wasn't designed with production deployments in mind.
What I'd suggest you explore instead is to see how far you can go toward making the standard, separate container solution affordable. For starters, you should be able to run the JM with minimal resources, since it doesn't have much to do.
Check this operator which automates the lifecycle of deploying and managing Flink in Kubernetes. The project is in beta but you can still get some idea about how to do it or directly use this operator if it fits your requirement. Here Job Manager and Task manager is separate kubernetes deployment.

rundeck: 1 job, multiple nodes involded?

I have a question about Rundeck features. Is it possible to include conditions within job execution? As it is quite difficult to explain, I provide an example:
You have 2 redundant firewalls in your network. You implement a job 'job1' and it's aim is to update your firewall's configuration. Master is down, therefore you do not want to update slave. Indeed if you do so, slave will have to restart and there will not have any firewall running for a short time. So, what I want to do is to test, before running the update, that none of my firewalls are out of service. If the master is down, then do not update slave.
So, is it possible to involve multiple nodes within one job?
Thanks for helping!
You create a job which pings both the firewalls. If both are up then this job will succeed. Now create another job which includes this job before update job in workflow. Make this job proceed only if first workflow succeeds. That should solve your problem.

Persistent storage for Apache Mesos

Recently I've discovered such a thing as a Apache Mesos.
It all looks amazingly in all that demos and examples. I could easily imagine how one would run for stateless jobs - that fits to the whole idea naturally.
Bot how to deal with long running jobs that are stateful?
Say, I have a cluster that consists of N machines (and that is scheduled via Marathon). And I want to run a postgresql server there.
That's it - at first I don't even want it to be highly available, but just simply a single job (actually Dockerized) that hosts a postgresql server.
1- How would one organize it? Constraint a server to a particular cluster node? Use some distributed FS?
2- DRBD, MooseFS, GlusterFS, NFS, CephFS, which one of those play well with Mesos and services like postgres? (I'm thinking here on the possibility that Mesos/marathon could relocate the service if goes down)
3- Please tell if my approach is wrong in terms of philosophy (DFS for data servers and some kind of switchover for servers like postgres on the top of Mesos)
Question largely copied from Persistent storage for Apache Mesos, asked by zerkms on Programmers Stack Exchange.
Excellent question. Here are a few upcoming features in Mesos to improve support for stateful services, and corresponding current workarounds.
Persistent volumes (0.23): When launching a task, you can create a volume that exists outside of the task's sandbox and will persist on the node even after the task dies/completes. When the task exits, its resources -- including the persistent volume -- can be offered back to the framework, so that the framework can launch the same task again, launch a recovery task, or launch a new task that consumes the previous task's output as its input.
Current workaround: Persist your state in some known location outside the sandbox, and have your tasks try to recover it manually. Maybe persist it in a distributed filesystem/database, so that it can be accessed from any node.
Disk Isolation (0.22): Enforce disk quota limits on sandboxes as well as persistent volumes. This ensures that your storage-heavy framework won't be able to clog up the disk and prevent other tasks from running.
Current workaround: Monitor disk usage out of band, and run periodic cleanup jobs.
Dynamic Reservations (0.23): Upon launching a task, you can reserve the resources your task uses (including persistent volumes) to guarantee that they are offered back to you upon task exit, instead of going to whichever framework is furthest below its fair share.
Current workaround: Use the slave's --resources flag to statically reserve resources for your framework upon slave startup.
As for your specific use case and questions:
1a) How would one organize it? You could do this with Marathon, perhaps creating a separate Marathon instance for your stateful services, so that you can create static reservations for the 'stateful' role, such that only the stateful Marathon will be guaranteed those resources.
1b) Constraint a server to a particular cluster node? You can do this easily in Marathon, constraining an application to a specific hostname, or any node with a specific attribute value (e.g. NFS_Access=true). See Marathon Constraints. If you only wanted to run your tasks on a specific set of nodes, you would only need to create the static reservations on those nodes. And if you need discoverability of those nodes, you should check out Mesos-DNS and/or Marathon's HAProxy integration.
1c) Use some distributed FS? The data replication provided by many distributed filesystems would guarantee that your data can survive the failure of any single node. Persisting to a DFS would also provide more flexibility in where you can schedule your tasks, although at the cost of the difference in latency between network and local disk. Mesos has built-in support for fetching binaries from HDFS uris, and many customers use HDFS for passing executor binaries, config files, and input data to the slaves where their tasks will run.
2) DRBD, MooseFS, GlusterFS, NFS, CephFS? I've heard of customers using CephFS, HDFS, and MapRFS with Mesos. NFS would seem an easy fit too. It really doesn't matter to Mesos what you use as long as your task knows how to access it from whatever node where it's placed.
Hope that helps!

Is there a way to make jobs in Jenkins mutually exclusive?

I have a few jobs in Jenkins that use Selenium to modify a database through a website's front end. If some of these jobs run at the same time, errors due to dirty reads can result. Is there a way to force certain jobs in Jenkins to be unable to run at the same time? I would prefer not to have to place or pick up a lock on the database, which could be read or modified by any number of users who are also testing.
You want the Throttle Concurrent Builds plugin which lets you define global and per-node semaphores.
Locks and latches is being deprecated in favor of Throttle Concurrent builds.
I've tried both the locks & latches plugin and the port allocator plugin as ways to achieve what you're trying to do. Neither worked reliably for me. Locks & latches worked some of the time, but I'd occasionally get hung jobs. Using port allocator as a hack will work unless you have multiple jenkins nodes, but the config overhead is kind of high. What I've ultimately settled upon is another hack, but it works reliably and uses core Jenkins stuff (no plugins):
create a slave node running on the same box as the master (or not, if you have lots of boxes)
give this slave a single executor (key)
tie the 2 (or n) jobs that must not run together to this new slave node
optionally set the slave's usage to 'tied jobs only' if it'll screw up your other jobs if they happen to run on the new slave
Since the slave has only one executor, the jobs tied to it can never run together.
If you regard the database as a shared resource that can only be used exclusively then this fits the usecase of the Lockable resources plugin.
It is being actively developed and improved and is very versatile.