Time taken for completion of PBS jobs - hpc

On a PBS system I have access to, I'm running some jobs using the -W x=NACCESSPOLICY:SINGLEJOB flag and, anecdotally, it seems that the same jobs take about 10% longer when adding this flag as without. Is this correct behaviour? If so, it surprises me, as I'd have thought having sole access to a whole node would, if anything, slightly decrease the time taken for a job to run due to having access to more memory.


How can kernel run all the time?

How can kernel run all the time, when CPU can execute only one process at a time ?
That is, if kernel is occupying CPU all the time , then how come other processes run.
Please explain
Thank You
In the same way that you can run multiple userspace processes at the same time: Only one of them is actually using the CPU at any given time. You have some interrupts that force them to give it up.
Code that is part of the operating system is no different here (except that it is in control of setting up this scheduling in the first place).
You also have to distinguish between processes run by the OS in the background (I suppose that is what you are talking about here), and system calls (which are being run as part of "normal" processes that temporarily switch into supervisor mode).

Why do my results become un-reproducible when run from qsub?

I'm running matlab on a cluster. when i run my .m script from an interactive matlab session on the cluster my results are reproducible. but when i run the same script from a qsub command, as part of an array job away from my watchful eye, I get believable but unreproducible results. The .m files are doing exactly the same thing, including saving the results as .mat files.
Anyone know why run one way the scripts give reproducible results, and run the other way they become unreproducible?
Is this only a problem with reproducibility or is this indicative of inaccurate results?
Thanks to spuder for a helpful answer. Just in case anyone stumbles upon this and is interested here is some further information.
If you use more than one thread in Matlab jobs, this may result in stealing resources from other jobs which plays havoc with the results. So you have 2 options:
1. Select exclusive access to a node. The cluster I am using is not currently allowing parallel array jobs, so doing this for me was very wasteful - i took a whole node but used it in serial.
2. Ask matlab to run on a singleCompThread. This may make your script take longer to complete, but it gets jobs through the queue faster.
There are a lot of variables at play. Ruling out transient issues such as network performance, and load here are a couple of possible explanations:
You are getting assigned a different batch of nodes when you run an interactive job from when you use qsub.
I've seen some sites that assign a policy of 'exclusive' to the nodes that run interactive jobs and 'shared' to nodes that run queued 'qsub' jobs. If that is the case, then you will almost always see better performance on the exclusive nodes.
Another answer might be that the interactive jobs are assigned to nodes that have less network congestion.
Additionally, if you are requesting multiple nodes, and you happen to land on nodes that traverse multiple hops, then you could be seeing significant network slowdowns. The solution would be for the cluster administrator to setup nodesets.
Are you using multiple nodes for the job? How are you requesting the resources?

Celery: limit memory usage (large number of django installations)

we're having a setup with a large number of separate django installations on a single box. each of these have their own code base & linux user.
we're using celery for some asynchronous tasks.
each of the installations has its own setup for celery, i.e. its own celeryd & worker.
the amount of asynchronous tasks per installation is limited, and not time-critical.
when a worker starts it takes about 30mb of memory. when it has run for a while this amount may grow (presumably due to fragmentation).
the last bulletpoint has already been (somewhat) solved by settings --maxtasksperchild to a low number (say 10). This ensures a restart after 10 tasks, after which the memory at least goes back to 30MB.
However, each celeryd is still taking up a lot of memory, since the minimum amount of workers appears to be 1 as opposed to 0. I also imagine running python manage.py celery worker does not lead to the smallest-possible footprint for the celeryd, since the full stack is loaded even if the only thing that happens is checking for tasks.
In an ideal setup, I'd like to see the following: a process that has a very small memory footprint (100k or so) is looking at the queue for new tasks. when such a task arises, it spins up the (heavy) full django stack in a separate process. and when the worker is done, the heavy process is spun down.
Is such a setup configurable using (somewhat) standard celery? If not, what points of extension are there?
we're (currently) using celery 3.0.17 and the associated django-celery.
Just to make sure I understand - you have a lot of different django codebases, each with their own celery, and they take up too much memory when running on a single box simultaneously, all waiting for a celery job to come down the pipe? How many celery instances are we talking about here?
In my experience, you're using django celery in a very different way than it was designed for - all of your different django projects should be condensed to a few (or a single) project(s), composed of multiple applications. Then you set up a small number of queues to field celery tasks from the different apps - this way, you only have as many dormant celery threads taking up 30mb as you have queues, and a single queue can handle multiple tasks (from multiple apps if you want). The memory issue should go away.
To reiterate - you only need one celeryd, driving multiple workers. This way your bottleneck is job concurrency, not dormant memory needs.
Why do you need so many django installations? Please let me know if I'm missing something, or if you need clarification.

Run time memory of perl script

I am having a perl script which is killed by a automated job whenever a high priority process comes as my script is running ~ 300 parallel jobs for downloading data and is consuming lot of memory. I want to figure out how much is the memory it takes during run time so that I can ask for more memory before scheduling the script or if I get to know using some tool the portion in my code which takes up more memory, I can optimize the code for it.
Regarding OP's comment on the question, if you want to minimize memory use, definitely collect and append the data one row/line at a time. If you collect all of it into a variable at once, that means you need to have all of it in memory at once.
Regarding the question itself, you may want to look into whether it's possible to have the Perl code just run once (rather than running 300 separate instances) and then fork to create your individual worker processes. When you fork, the child processes will share memory with the parent much more efficiently than is possible for unrelated processes, so you will, e.g., only need to have one copy of the Perl binary in memory rather than 300 copies.

Does a modified command invocation tool – which dynamically regulates a job pool according to load – already exist?

Fellow Unix philosophers,
I programmed some tools in Perl that have a part which can run in parallel. I outfitted them with a -j (jobs) option like make and prove have because that's sensible. However, soon I became unhappy with that for two reasons.
I specify --jobs=2 because I have two CPU cores, but I should not need to tell the computer information that it can figure out by itself.
Rarely runs of the tool occupy more than 20% CPU (I/O load is also little), wasting time by not utilising CPU to a better extent.
I hacked some more to add a load measuring, spawning additional jobs while there's still »capacity« until a load threshold is reached, this is when the number of jobs stays more or less constant, but when during the course of a run other processes with higher priority are in demand of more CPU, over time less new jobs are spawned and accordingly the number of jobs reduces.
Since this responsibility was repeated code in the tools, I factored out the scheduling aspect into a stand-alone tool following the spirit of nice et al.. The parallel tools are quite dumb now, they only have signal handlers through which they are told to increase or decrease the jobs pool, whereas the intelligence of load measuring and figuring out when to control the pool resides in the scheduler.
Taste of the tentative interface (I also want to provide sensible defaults so options can be omitted):
run-parallel-and-schedule-job-pool \
--cpu-load-threshold=90% \
--disk-load-threshold='300 KiB/s' \
--network-load-threshold='1.2 MiB/s' \
--increase-pool='/bin/kill -USR1 %PID' \
--decrease-pool='/bin/kill -USR2 %PID' \
-- \
parallel-something-master --MOAR-OPTIONS
Before I put effort into the last 90%, do tell me, am I duplicating someone else's work? The concept is quite obvious, so it seems it should have been done already, but I couldn't find this as a single responsibility stand-alone tool, only as deeply integrated part of larger many-purpose sysadmin suites.
Bonus question: I already know runN and parallel. They do parallel execution, but do not have the dynamic scheduling (niceload goes into that territory, but is quite primitive). If against my expectations the stand-alone tool does not exists yet, am I better off extending runN myself or filing a wish against parallel?
some of our users are quite happy with condor. It is a system for dynamically distributing jobs to other workstations and servers according to their free computing resources.