I have a question:
In which scenario FIFO (First In First Out) scheduling is possible in multiprogramming when we have only 1 processor?
Multiprogramming concept is the ability of the operating system to have multiple programs in memory.
Multiprogramming actually mean switching between the processes or interleaving the I/O time and CPU time of processes.
So it is independent of number of processors.(i.e. Even if there is only one processor it may be working on multiple programs.)
Related
I am running a for loop using MATLAB's parfor function. My CPU's specs are
I set preferred number of workers to 24. However, MATLAB sets this number to 6. Is number of workers bounded by the number of cores or by (number of cores)x(number of processors=6x12?
Matlab prefers to limit the number of workers to the number of cores (six in your case).
Your CPU (intel i7-9750H) has hyperthreading, i.e. you can run multiple (here 2) threads per core. However, this is of no use if you want to run them under full-load, which means that there is simply no resources available to switch to a different task (what the additional threads effectively are).
See the documentation.
Restricting to one worker per physical core ensures that each worker
has exclusive access to a floating point unit, which generally
optimizes performance of computational code. If your code is not
computationally intensive, for example, it is input/output (I/O)
intensive, then consider using up to two workers per physical core.
Running too many workers on too few resources may impact performance
and stability of your machine.
Note that Matlab needs to stream data to every core in order to run the distributed code. This is some kind of initialization effort and the reason why you won't be able to cut the runtime in half if you double the number of cores/workers. And that is also the explanation why there is no use for Matlab to make use of hyperthreading. It would just mean to increase the initial streaming effort without any speed-up -- in fact, the core would probably force matlab to save intermediate results and switch to the other task from time to time... which is the same task as before;)
What is the degree of multiprogramming in OS?
Is it the number of processes in the ready queue or the number of processes in the memory?
In a multiprogramming-capable system, jobs to be executed are loaded into a pool. Some number of those jobs are loaded into main memory, and one is selected from the pool for execution by the CPU. If at some point the program in progress terminates or requires the services of a peripheral device, the control of the CPU is given to the next job in the pool.
An important concept in multiprogramming is the degree of multiprogramming. The degree of multiprogramming describes the maximum number of processes that a single-processor system can accommodate efficiently.
These are some of the factors affecting the degree of multiprogramming:
The primary factor is the amount of memory available to be allocated
to executing processes. If the amount of memory is too limited, the
degree of multiprogramming will be limited because fewer processes
will fit in memory.
Operating system - The means by which resources are allocated to processes. If the operating system
can not allocate resources to executing processes in a fair and
orderly fashion, the system will waste time in reallocation, or
process execution could enter into a deadlock state as programs wait
for allocated resources to be freed by other blocked processes.
Other factors affecting the degree of multiprogramming are program
I/O needs, program CPU needs, and memory and disk access speed.
Hope this answers you. :)
If not, You can get it in more detail here: http://www.tcnj.edu/~coburn/os
For a system with a single CPU core, there will never be more than one
process running at a time, whereas a multicore system can run multiple
processes at one time. If there are more processes than cores, excess
processes will have to wait until a core is free and can be
rescheduled. The number of processes currently in memory is known as
the degree of multiprogramming.
Excerpt from: Operating System Concepts, 10th Edition, Abraham Silberschatz
I want to know the differences between
1. labs
2. workers
3. cores
4. processes
Is it just the semantics or they are all different?
labs and workers are MathWorks terminologies, and they mean roughly the same thing.
A lab or a worker is essentially an instance of MATLAB (without a front-end). You run several of them, and you can run them either on your own machine (requires only Parallel Computing Toolbox) or remotely on a cluster (requires Distributed Computing Server). When you execute parallel code (such as a parfor loop, an spmd block, or a parfeval command), the code is executed in parallel by the workers, rather than by your main MATLAB.
Parallel Computing Toolbox has changed and developed its functionality quite a lot over recent releases, and has also changed and developed the terminologies it uses to describe the way it works. At some point it was convenient to refer to them as labs when running an spmd block, but workers when running a parfor loop, or working on jobs and tasks. I believe they are moving now toward always calling them workers (although there's a legacy in the commands labSend, labReceive, labBroadcast, labindex and numlabs).
cores and processes are different, and are not themselves anything to do with MATLAB.
A core is a physical part of your processor - you might have a dual-core or quad-core processor in your desktop computer, or you might have access to a really big computer with many more than that. By having multiple cores, your processor can do multiple things at once.
A process is (roughly) a program that your operating system is running. Although the OS runs multiple programs simultaneously, it typically does this by interleaving operations from each process. But if you have access to a multiple-core machine, those operations can be done in parallel.
So you would typically want to tell MATLAB to start one worker for each of the cores you have on your machine. Each of those workers will be run as a process by the OS, and will end up being run one worker per core in parallel.
The above is quite simplified, but I hope gives a roughly accurate picture.
Edit: moved description of threads from a comment to the answer.
Threads are something different again. Threads are also not in themselves anything to do with MATLAB.
Let's go back to processes for a moment. One thing I didn't mention above is that the OS allocates each process a specific block of memory which other processes shouldn't be able to touch, so that it's difficult for them to interact with each other and mess things up.
A thread is like a process within a process - it's a stream of operations that the process runs. Typically, operations from each thread would be interleaved, but if you have multiple cores, they can also be parallelized across the cores.
However, unlike processes, they all share a memory block, which is OK because they're all managed by the same program so it should matter less if they're allowed to interact.
Regular MATLAB automatically uses multiple threads to parallelize many built-in operations (such as matrix multiplication, svd, eig, linear algebra etc) - that's without you doing anything, and whether or not you have Parallel Computing Toolbox.
However, MATLAB workers are each run as a single process with a single thread, so you have full control over how to parallelize.
I think workers are synonyms for processes. The term "cores" is related to the hardware. Labs is a mechanism which allows workers to communicate with each other. Each worker has at least one lab but can own more.
This piece of a discussion may be useful
http://www.mathworks.com/matlabcentral/answers/5529-mysterious-behavior-in-parfor-i-know-sounds-basic-but
I hope someone here will deliver more information in a more rigorous way
I know MP is the manage of multiple processes within multiple processors, but is there any difference between that and SMP? Is it that in SMP you can execute multiple threads from the same process simultaniously, and MP you can only have one process occupy one processor?
example of what i think the differences are:
SMP
P1 has 3 threads: P1T1, P1T2 and P1T3
P2 has 2 threads: P2T1 and P2T2.
on a computer with 3 processors, you can assign P1T1 to processor 1, P1T2 to processor 2 and P1T3 to processor 3 simultaneously if all are available, or P2T1 to processor 1, and P2T2 to processor 2 and P1T1 to processor 3.
MP
P1 has 3 threads: P1T1, P1T2 and P1T3
P2 has 2 threads: P2T1 and P2T2.
on a computer with 3 processors, you can assign P1T1 to processor 1 and -
P1 has 3 threads: P1T1, P1T2 and P1T3
P2 has 2 threads: P2T1 and P2T2.
on a computer with 3 processors, you can assign P1T1 to processor 1, but P1T2 and P1T3 have to wait until P1T1 is done in order to execute, while P2T1 can go to processor 2, and, again, P2T2 would have to wait until P2T1 is done executing before it ca execute.
Does this make sence? If it does, am I on the right track? Thx, I got an OS exam today and I'm studying. Thank you for any help you guys can provide.
Also, how are threads scheaduled? I know that is a very broad question, but is there any specific way? or is it based on the scheaduling the system has implemented? I know there is round robin scheaduling, higher priority, time slicing, time sharing, shortest amount of time... If this question doesnt make sence, no worries, I appreciate any help you guys can give.
Actually,SMP is a division of MP. SO, the question of difference doesn't make much sense. Any MP can be either of the two---either Symmetric MP or Parallel(Asymmetric) MP.
In your case,examples can't be taken into account to differentiate these two because of the mentioned above reason.
Also, in SMP, the two CPU's or processors reside on different machines or are separate processors or are different cores which work on the same shared memory to achieve the work done!
As mentioned in Wikipedia about Symmetric Multiprocessing :-
Symmetric multiprocessing (SMP) involves a symmetric multiprocessor
system hardware and software architecture where two or more identical
processors connect to a single, shared main memory, have full access
to all I/O devices, and are controlled by a single operating system
instance that treats all processors equally, reserving none for
special purposes. Most multiprocessor systems today use an SMP
architecture. In the case of multi-core processors, the SMP
architecture applies to the cores, treating them as separate
processors.
In ye olde days of multiprocessing systems (e.g., VAX 11/782) one processor was the master and the remainder were slaves. The master processor assigned the tasks to the other processors when it was idle and did work otherwise.
In an SMP system, god created processors equal. They use locking mechanism to select tasks.
I'm thinking of slowly picking up Parallel Programming. I've seen people use clusters with OpenMPI installed to learn this stuff. I do not have access to a cluster but have a Quad-Core machine. Will I be able to experience any benefit here? Also, if I'm running linux inside a Virtual machine, does it make sense in using OpenMPI inside a VM?
If your target is to learn, you don't need a cluster at all. Your quad-core (or any dual-core or even a single-cored) computer will be more than enough. The main point is to learn how to think "in parallel" and how to design your application.
Some important points are to:
Exploit different parallelism paradigms like divide-and-conquer, master-worker, SPMD, ... depending on data and tasks dependencies of what you want to do.
Chose different data division granularities to check the computation/communication ratio (in case of message passing), or to check the amount of serial execution because of mutual exclusion to memory regions.
Having a quad-core you can measure your approach speedup (the gain on performance attained because of the parallelization) which is normally given by the division between the time of the non parallelized execution and the time of the parallel execution.
The closer you get to 4 (four cores meaning 1/4th the execution time), the better your parallelization strategy was (once you could evenly distribute work and data).