bind9 (named) does not start in multi-threaded mode - multicore

From the bind9 man page, I understand that the named process starts one worker thread per CPU if it was able to determine the number of CPUs. If its unable to determine, a single worker thread is started.
My question is how does it calculate the number of CPUs? I presume by CPU, it means cores. The Linux machine I work is customized and has kernel 2.6.34 and does not support lscpu or nproc utilities. named is starting a single thread even if i give -n 4 option. Is there any other way to force named to start multiple threads?
Thanks in advance.

Related

Does assigning more nodes to a job on a SLURM server increase available RAM?

I am working with a program that needs a lot RAM. Currently I am running it on a SLURM cluster. Each node has 125GB RAM. When submitting the job to a single node it eventually fails as it runs out of memory. My rather naive question, as I am new to working on servers, is:
Does assigning more nodes with the command --nodes flag increase available RAM for the submitted job?
For example:
When assigning 10 nodes instead of 1, with the command below, the program fails at the same point as with with one node.
#SBATCH --nodes=10
Is there some other way to combine RAM from multiple nodes for a single job?
Any and all advice is welcome!
That depends on your program, but most likely no.
To use multiple nodes on a Slurm Cluster (or any cluster, for that matter), your program needs to be set up in very specific way, ie. you need inter node communictaion. This is usually done via MPI and the whole program has to be designed around it.
So if your program uses MPI it may be able to split the workload over several nodes. And even that does not guarantee lower memory as that is usually not the goal of such a parallelization.

Queries regarding celery scalability

I have few questions regarding celery. Please help me with that.
Do we need to put the project code in every celery worker? If yes, if I am increasing the number of workers and also I am updating my code, what is the best way to update the code in all the worker instances (without manually pushing code to every instance everytime)?
Using -Ofair in celery worker as argument disable prefetching in workers even if have set PREFETCH_LIMIT=8 or so?
IMPORTANT: Does rabbitmq broker assign the task to the workers or do workers pull the task from the broker?
Does it make sense to have more than one celery worker (with as many subprocesses as number of cores) in a system? I see few people run multiple celery workers in a single system.
To add to the previous question, whats the performance difference between the two scenarios: single worker (8 cores) in a system or two workers (with concurrency 4)
Please answer my questions. Thanks in advance.
Do we need to put the project code in every celery worker? If yes, if I am increasing the number of workers and also I am updating my code, what is the best way to update the code in all the worker instances (without manually pushing code to every instance everytime)?
Yes. A celery worker runs your code, and so naturally it needs access to that code. How you make the code accessible though is entirely up to you. Some approaches include:
Code updates and restarting of workers as part of deployment
If you run your celery workers in kubernetes pods this comes down to building a new docker image and upgrading your workers to the new image. Using rolling updates this can be done with zero downtime.
Scheduled synchronization from a repository and worker restarts by broadcast
If you run your celery workers in a more traditional environment or for some reason you don't want to rebuild whole images, you can use some central file system available to all workers, where you update the files e.g. syncing a git repository on a schedule or by some trigger. It is important you restart all celery workers so they reload the code. This can be done by remote control.
Dynamic loading of code for every task
For example in omega|ml we provide lambda-style serverless execution of
arbitrary python scripts which are dynamically loaded into the worker process.
To avoid module loading and dependency issues it is important to keep max-tasks-per-child=1 and use the prefork pool. While this adds some overhead it is a tradeoff that we find is easy to manage (in particular we run machine learning tasks and so the little overhead of loading scripts and restarting workers after every task is not an issue)
Using -Ofair in celery worker as argument disable prefetching in workers even if have set PREFETCH_LIMIT=8 or so?
-O fair stops workers from prefetching tasks unless there is an idle process. However there is a quirk with rate limits which I recently stumbled upon. In practice I have not experienced a problem with neither prefetching nor rate limiting, however as with any distributed system it pays of to think about the effects of the asynchronous nature of execution (this is not particular to Celery but applies to all such such systems).
IMPORTANT: Does rabbitmq broker assign the task to the workers or do workers pull the task from the broker?
Rabbitmq does not know of the workers (nor do any of the other broker supported by celery) - they just maintain a queue of messages. That is, it is the workers that pull tasks from the broker.
A concern that may come up with this is what if my worker crashes while executing tasks. There are several aspects to this: There is a distinction between a worker and the worker processes. The worker is the single task started to consume tasks from the broker, it does not execute any of the task code. The task code is executed by one of the worker processes. When using the prefork pool (which is the default) a failed worker process is simply restarted without affecting the worker as a whole or other worker processes.
Does it make sense to have more than one celery worker (with as many subprocesses as number of cores) in a system? I see few people run multiple celery workers in a single system.
That depends on the scale and type of workload you need to run. In general CPU bound tasks should be run on workers with a concurrency setting that doesn't exceed the number of cores. If you need to process more of these tasks than you have cores, run multiple workers to scale out. Note if your CPU bound task uses more than one core at a time (e.g. as is often the case in machine learning workloads/numerical processing) it is the total number of cores used per task, not the total number of tasks run concurrently that should inform your decision.
To add to the previous question, whats the performance difference between the two scenarios: single worker (8 cores) in a system or two workers (with concurrency 4)
Hard to say in general, best to run some tests. For example if 4 concurrently run tasks use all the memory on a single node, adding another worker will not help. If however you have two queues e.g. with different rates of arrival (say one for low frequency but high-priority execution, another for high frequency but low-priority) both of which can be run concurrently on the same node without concern for CPU or memory, a single node will do.

KVM CPU share / priority / overselling

i have question about KVM i could not find any satisfying answer in the net about.
Lets say i want to create 3 virtual machines on a host with 2 CPUs. I am assigning 1 cpu to 1 virtual machines. The other 2 virtual machines should be sharing 1 cpu. If it is possible i want to give 1 vm 30 % and the other one 70% of the cpu.
I know this does not make much sense but i am curious and want to test is :-)
I know that hypervisors like onapp can do that. But how do they do it?
KVM represents each virtual CPU as a thread in the host Linux system, actually as a thread in the QEMU process. So scheduling of guest VCPUs is controlled by the Linux scheduler.
On Linux, you can use taskset to force specific threads onto specific CPUs. So that will let you assign one VCPU to one physical CPU and two VCPUs to another. See, for example, https://groups.google.com/forum/#!topic/linuxkernelnewbies/qs5IiIA4xnw.
As far as controlling what percent of the CPU each VM gets, Linux has several scheduling policies available, but I'm not familiar with them. Any information you can find on how to control scheduling of Linux processes will apply to KVM.
The answers to this question may help: https://serverfault.com/questions/313333/kvm-and-virtual-to-physical-cpu-mapping. (Also that forum may be a better place for this question, since this one is intended for programming questions.)
If you search for "KVM virtual CPU scheduling" and "Linux CPU scheduling" (without the quotes), you should find plenty of additional information.

a guestOS process occupies VCPU at any given time?

Recently i`ve been studying something about hardware-supported virtualization.
I read about 3 states of host cpu ,thus the most common userspace,kernelspace and A New Guest State.And as i can see from the ps command,there is a process for the vm i started,and some 'sub'-threads for each cpu owned by the virtual machine.Also i noticed when the vm runs some io related program,some more threads will be created on the host,which i guess might be the responses of qemu for hardware emulation.
So here comes my question:For any certain time(time in guest state,not the other two),does a vcpu thread represent a guestOS process running(i mean 'occupy' and 'exclusively')?just the same as a physical cpu,for any given time in userspace,a user process is running on it.
This may sound a little stupid,I just want to figure it out for further research.
To make this question simple:
is the vcpu thread which runs on host machine associated with some guestOS process at any given time?
To further simplify it:
is it right when i say the guestOS processes are actually running on the host CPU directly and scheduled as ordinary host-processes?the difference between the two kinds of process being what we called virtualization?
Maybe i need another threads to solve some questions about guestOS process switching,but before that,hope you guys can help me with this one.
sincerely
MeNok
I posted this question on LQ and got the answer.
http://www.linuxquestions.org/questions/linux-virtualization-90/a-guestos-process-occupies-vcpu-at-any-given-time-4175419271/
VCPU is not a thread in host. KVM allows guest to run directly on a physical CPU with a less privilege guest mode. A timer interrupt will cause CPU back from guest mode to host mode and return to KVM. Since KVM is scheduled in kernel mode, a guest should be also scheduled in the host as well.

Load distribution to instances of a perl script running on each node of a cluster

I have a perl script (call it worker) installed on each node/machine (4 total) of a cluster (each running RHEL). The script itself is configured as a RedHat Cluster service (which means the RH cluster manager would ensure that one and exactly one instance of this script is running as long as at least one node in the cluster is up).
I have X amount of work to be done every day once a day, which this script does. So far the X was small enough and only one instance of this script was enough to do it. But now the load is going to increase and along with High Availability (viz already implemented using RHCS), I also need load distribution.
Question is how do I do that?
Of course I have a way to split the work in n parts of size X/n each. Options I had in mind:
Create a new load distributor, which splits the work in jobs of X/n. AND one of the following:
Create a named pipe on the network file system (which is mounted and visible on all nodes), post all jobs to the pipe. Make each worker script on each node read (atomic) from the pipe and do the work. OR
Make each worker script on each node listen on a TCP socket and the load distributor send jobs to each this socket in a round robin (or some other algo) fashion.
Theoretical problem with #1 is that we've observed some nasty latency problems with NFS. And I'm not even sure if NFS would support IPC via named pipes across machines.
Theoretical problem with #2 is that I have to implement some monitors to ensure that each worker is running and listening, which being a noob to Perl, I'm not sure if is easy enough.
I personally prefer load distributor creating a pool and workers pulling from it, rather than load distributor tracking each worker and pushing work to each. Any other options?
I'm open to new ideas as well. :)
Thanks!
-- edit --
using Perl 5.8.8, to be precise: This is perl, v5.8.8 built for x86_64-linux-thread-multi
If you want to keep it simple use a database to store the jobs and then have each worker lock the table and get the jobs they need then unlock and let the next worker do it's thing. This isn't the most scalable solution since you'll have lock contention, but with just 4 nodes it should be fine.
But if you start going down this road it might make sense to look at a dedicated job-queue system like Gearman.