Using Amdahl's law and time to calculate amount of processors? - parallelism-amdahl

Let's say I have a program T that has a serial portion that consumes 355 s, and a parallel portion that uses 645 s.
How can I find out how many processors I need so that the program T's parallel run time is less than or equal to 51% of its serial run time?

Related

In Short Job First (SJF) scheduling algorithm does IO bound jobs get priority over CPU bound jobs?

Recently I came across the statement that
In SJF IO bound jobs get priority over CPU bound jobs.
I found this statement in page 4 of this slide and also in page 3 of this slide. I decide to attach the corresponding pictures below, if in case the link breaks in the future.
But I am having difficulty to understand the above and it seems rather counter intuitive to me. My argument is as follows:
I assume a CPU bound process is one which uses has higher CPU burst:(CPU Burst+IO Burst) and I assume a process as IO bound which has higher IO Burst:(CPU Burst+IO Burst). I assumed it from the knowledge I have received after reading the textbook "Operating Concepts" by Galvin et. al and the excerpt is below:
An I/O-bound process is one that spends more of its time doing I/O than it spends doing computations. A CPU-bound process, in contrast, generates I/O requests infrequently, using more of its time doing computations.
Which I guess agrees with what the professor says here.
Based on this I came up with the following examples:
Suppose I have two jobs
JOB 1: CPU BURST = 10 units; IO BURST =100 units
JOB 2: CPU BURST= 100 units; IO BURST=10 units...
SJF shall schedule JOB1 first which is IO Bound...
———————————————————————————
suppose I have two other jobs
JOB 3: CPU BURST = 10 units; IO BURST =1 units
JOB 4: CPU BURST= 100 units; IO BURST=200 units...
SJF shall schedule JOB3 first which is CPU bound...
From the above example I do not find any such correlation that SJF gives priority to IO bound jobs.

What fraction of the CPU time is wasted ? (Modern Operating Systems, 4th ed)

it's my first post here.
I'm currently learning Modern Operating Systems and I'm stuck at this question : A computer system has enough room to hold five programs in its main memory. These programs are idle waiting for I/O half of the time. What fraction of the CPU time is wasted?
The answer is 1/32, but why ?
The answer is 1/32, but why ?
The sentence "These programs are idle waiting for I/O half of the time" is ambiguous. Let's look at a few different ways of interpreting this sentence and see if they match the expected answer:
a) "Each of the 5 programs spends 50% of the total time waiting for IO". In this case, while one program is waiting for IO the CPU could be being used by other programs; and all programs combined could use 100% of CPU time with no time wasted. In fact, you'd be able to use 100% of CPU time with only 2 programs (the 1st program uses the CPU while the 2nd program waits for IO, then the 2nd program uses the CPU while the 1st task waits for IO, then ...). This can't be the intended meaning of "These programs are idle waiting for I/O half of the time" because the answer (possibly zero CPU time wasted) doesn't match the expected answer.
b) "All of the programs are idle waiting for I/O at the same time, for half the time". This can't be the intended meaning of the question because the answer would obviously be "50% of CPU time is wasted" and doesn't match the expected answer.
c) "Each program spends half of the time available to it waiting for IO". In this case, the first program has 100% of CPU time available to it but spends 50% of the time using the CPU and waits for IO for the other 50% of the time, leaving 50% of CPU time available for the next program; then the 2nd program uses 50% of the remaining CPU time (25% of total time) using the CPU and 50% of the remaining CPU time (25% of total time) waiting for IO, leaving 25% of CPU time available for the next program; then the third program uses 50% of the remaining CPU time (12.5% of total time) using the CPU and 50% of the remaining CPU time (12.5% of total time) waiting for IO, leaving 12.5% of CPU time available to the next programs, then...
In this case, the remaining time is halved by each program, so you get a "negative power of 2" sequence (1/2, 1/4, 1/8, 1/16, 1/32) that arrives at an answer that matches the expected answer.
Because we get the right answer for this interpretation, we can assume that this is what "These programs are idle waiting for I/O half of the time" was supposed to mean.

Calculate the performance of a multicore architecture?

Cal a multicore architecture with 10 computing cores: 2 processor cores and 8 coprocessors. Each processor core can deliver 2.0 GFlops, while each coprocessor can deliver 1.0 GFlops. All computing cores can perform calculation simultaneously. Any instruction can execute in either processor or coprocessor cores unless there are any explicit restrictions.
If 70% of dynamic instructions in an application are parallelizable, what is the maximum average performance (Flops) you can get in the optimal situation? Please note that the remaining 30% instructions can be executed only after the execution of the parallel 70% is over.
Consider another application where all the dynamic instructions can be partitioned into 6 groups (A, B, C, D, E, F) with the following dependency. For example, A --> C implies that all the instructions in A need to be completed before starting the execution of instructions in C. Each of the first four groups (A, B, C and D) contains 20% of the dynamic instructions whereas each of the remaining two groups (E and F) contains 10% of the dynamic instructions. All the instructions in each group must be executed sequentially on the same processor or coprocessor core. How to schedule them on the multicore architecture to achieve the best possible performance? What is the maximum average performance (Flops) now?
A(20%) --> C(20%) -->
E(10%)-->F(10%)
B(20%) --> d(20%) -->
For the first part, you need to use Amdahl's Law, which is:
max speed-up = 1/(1-p+p/n)
where p is the parallelizable part. n is the improvement factor in executing the parallel portion.
(Note that the Amdahl's Law formula can be used for first order estimates on other types of changes. E.g., given a factor of N reduction in ALU energy use and P fraction of energy used by the ALU, one can find the improvement in total energy use.)
In your case, since the serial portion would be executed on the higher performance (2 GFLOPS) processor core, n is 6 ([8 coprocessor cores * 1 GFLOPS/core + 2 processor cores * 2 GFLOPS/core]/ 2 GFLOPS/processor core).
A quick calculation shows the max speed-up you can get is 2.4 related to 1 processor core. The maximum FLOPS would therefore be the speed-up times the speed if the whole program was executed serially on one processor core, i.e., 2.4 * 2 GFLOPS = 4.8 GFLOPS.
For the second part, note that initially there are two independent instruction streams: A -> C and B -> C. Since the system has two processor cores, both can be executed in parallel on the higher performance processor cores. Furthermore, both have the same amount of work (40% of total for each stream), so one the same performance core they will complete at the same time.
Since E depends on results from both C and D, it must be started after both finish. E and F would execute on a processor core (which core is arbitrary since E must wait for the tasks running on both processor cores to complete).
As you can see 80% of the program (40% for A+C; 40% for B+D) can be parallelized by a factor of 2 and 20% of the program (E+F) is serial. You can then just plug the numbers into the Amdahl's Law formula (p=0.8, n=2).

To find execution time on a mult-icore machine

I'am preparing for a competitive exam and i have an operating system question.
I'am not getting how to solve it. please help me out.
Q-)
A program took 160 seconds to execute on a single processor but only 64 seconds on a
4 core multicore. What is the best estimate for the execution time on a 64 core machine?
I don't think this is strictly relevant to programming (you might find this more relevant on the Math StackExchange but I'll attempt to answer it anyway.
The answer will depend entirely on how you model execution time vs number of cores. You could model the execution time as inversely proportional to the number of cores. For example, I used the following model:
Where t is time in seconds and n is number of cores, c (could represent overhead) and k (a factor) are constants.
Solve simultaneously
to get k = 128 and c = 32.
Then just substitute n = 64
So, you get 34 seconds according to this model. Of course, since you don't know the exact model, this can only be a calculated guess.

Amdahl's law example

Can someone help me with this example please and show me how to work the second part?
the question is :
If one third of a weather prediction algorithm is inherently serial and the remainder
parallelizable, what is the minimum number of cores needed to guarantee a 150% speedup over a
single core implementation?
ii. Your boss revises the figure to 200%. What is your new answer?
Thanks very much in advance !!
Guess: If the algorithm is 1/3 serial and 2/3 parallel...I would think that each core you added would give you a 66% increase in performance...So for 150% increase, you'd need 3 more cores, and for a 200% increase, you'd need 4.
This is a guess. Your textbook might be more helpful :)
If the algorithm runs on a single core and takes 90 minutes then 30 minutes is for the serial part and 60 minutes for the parallel part.
Add a CPU:
30 is for the serial part and 30 for the parallel part(half of the 60 overlaps with the serial part).
90 / 60 = 150% increase.
I am a bit late, but here are the answers:
1) 150% increase -> 2 cores at least required as dbasnett said;
2) 200% increase -> 4 cores at least required basing on the Amahld's law:
Here, 90 minutes overall required to perform the calculation. P is the actually enhanced part of the algorithm (the parallelizable part) which is 2/3 of 90, N is the number of cores, so when there's a core only:
You get 1, which means 100%, which is how the algorithm performs the standard way (without multi-core acceleration and therefore no parallelization speedup).
Now, we must find N number of cores for which the previous equation equals 2, where 2 means that the algorithm performs in half time (45 minutes instead of 90 when there's no parallelization) and therefore with a 200% speedup:
Since:
We see that:
So with 4 cores computing in parallel the 2/3 of the algoritm you get 200% speedup. The same goes for 150%, you will get 2, as dbasnett already told you.
Pretty simple.
Note that a complex algorithm may imply further divisions of its parallelizable parts (and in theory you can have a different number of processing units per parallelizable part concurrently):
You can further look at Wikipedia (there's also an example):
http://en.wikipedia.org/wiki/Amdahl%27s_law#Description
Anyway, the principle is the same:
Let T be the time an algorithm needs to execute in order to complete, A be the serial part of it, B its parallelizable part and N the number of parallel CPUs, you can divide B in further small sections and perform calculations on each part:
You may for C, D, G e.g. adopt M CPUs instead of N (the speedup will of course differ if M != N).
And at the end, you will arrive at a point when having more CPUs doesn't matter anymore, since:
And your algorithm speedup will at most tend to total execution time (T) divided by the execution time of the Serial part only (A).
Therefore parallel calculation comes really handy only when you have low execution time for the serial part of your algorithm.