Average memory access time - cpu-architecture

I would like to know did I solve the equation correctly below
find the average memory access time for process with a process with a 3ns clock cycle time, a miss penalty of 40 clock cycle, a miss rate of .08 misses per instruction, and a cache access time of 1 clock cycle
AMAT = Hit Time + Miss Rate * Miss Penalty
Hit Time = 3ns, Miss Penalty = 40ns, Miss Rate = 0.08
AMAT = 3 + 0.08 * 40 = 6.2ns

Check the "Miss Penalty". Be more careful to avoid trivial mistakes.
The question that you tried to answer cannot actually be answered, since you are given 0.08 misses per instruction but you don't know the average number of memory accesses per instruction. In an extreme case, if only 8 percent of instructions accessed memory, then every memory access would be a miss.

Related

Compute the average load time

A computer has a cache, main memory and a hard disk. If a referenced
word is in the cache, it takes 15 ns to access it. If it is in main memory
but not in the cache, it takes 85 ns to load (the block containing) it
into the cache (this includes the time to originally check the cache),
and then the reference lookup is started again. If the word is not in
main memory, it takes 10 ms to load (the block containing) it from
the disk into main memory, and then the reference lookup is started
again. The cache hit ratio is 0.4. In the case of a cache miss, the
probability that the word is in the main memory is 0.7. Compute the
average load time.
My Answer
Given:
Cache access time = 15 ns
Cache hit rate = 0.4
Cache miss rate = 1 – 0.4 = 0.6
RAM access time = 85 ns
RAM hit rate = 0.7
RAM miss rate = 1 – 0.7 = 0.3
Disk access time = 10ms = 10000000 ns
> Average access time = (cache access time x cache hit rate) + (cache
> miss rate) x (RAM access time + RAM hit rate) + (cache miss rate x ram
> miss rate x disk access time)
> = (15*0.4) + (0.6)(85*0.7) + (0.6)(0.3)(10000000)
> = 1 800 041,7 ns
I hope you are enjoying the computer systems class at Birkbeck... ;P
I think you missed something though:
(1) You are assuming the 10ms include the initial checking of the cache (he specified it for the 85ns but not for the 10ms so would add that to be on the safe side)
(2) It says the reference lookup is started again after loading into cache and main memory respectively... So from the question I understand that words can only be accessed from the cache (otherwise why bother with the 85ns?). Hence, I think you need to add the time it takes to load it into the cache from main memory when retrieved from the disk initially. Also, although I am not entirely sure on this one as it is a bit ambiguous, I think you need to add another 15ns for the word to be accessed in the cache after its loaded from main memory...
Interested to hear some thoughts
Kindly share the answers of first three questions in assignment here if you have done em.. :P
For this question answer is :
Cache access time = 15ns
Memory access time = 85ns +15ns = 100ns
Disk access time = 10x106 + 100ns = 10000100ns
Average load time = 0.4 x 15ns + 0.6[0.7 x 100ns + 0.3(10000100ns)]
Average load time = 6 +0.6(70 + 3000030) = 6 + 1800060
Average load time = 1800066ns = 1.8ms
If it is in main memory but not in the cache, it takes 85 ns to load (the block containing) it into the cache (this includes the time to originally check the cache).
You dont't need to add 85 (Memory) and 15 (Cache)
For this question answer is :
Cache access time = 15ns
Memory access time = 85ns
Disk access time = 10x106 + 85ns = 1000085ns
Average load time = 0.4 x 15ns + 0.6[0.7 x 85ns + 0.3(1000085ns)]
Average load time = 6 +0.6(59.5 + 3000025.5) = 6 + 1800051
Average load time = 1800057ns = 1.8ms

How to calculate cache miss rate

I'm trying to answer computer architecture past paper question (NOT a Homework).
My question is how to calculate the miss rate.(complete question ask to calculate the average memory access time) The complete question is,
For a given application, 30% of the instructions require memory access. Miss rate is 3%. An instruction can be executed in 1 clock cycle. L1 cache access time is approximately 3 clock cycles while L1 miss penalty is 72 clock cycles. Calculate the average memory access time.
Needed equations,
Average memory access time = Hit time + Miss rate x Miss penalty
Miss rate = no. of misses / total no. of accesses (This was found from stackoverflow)
As I mentioned above I found how to calculate miss rate from stackoverflow ( I checked that question but it does not answer my question) but the problem is I cannot imagine how to find Miss rate from given values in the question.
What I have done up to now
Average memory access time = 30% * (1 + 3% * 72) + 100% * (1 + M*72)
M - miss rate
what I need to find is M. (If I am correct up to now if not please tell me what I've messed up)

What is the speedup? Can't understand the solution

I'm going through a Computer Architecture MOOC on my time. There is a problem I can't solve. The solution is provided but I can't understand the solution. Can someone help me out. Here is the problem and the solution to it:
Consider an unpipelined processor. Assume that it has 1-ns clock cycle
and that it uses 4 cycles for ALU operations and 5 cycles for branches
and 4 cycles for memory operations. Assume that the relative
frequencies of these operations are 50 %, 35 % and 15 % respectively.
Suppose that due to clock skew and set up, pipelining the processor
adds 0.15 ns of overhead to the clock. Ignoring any latency impact,
how much speed up in the instruction execution rate will we gain from
a pipeline?
Solution
The average instruction execution time on an unpipelined processor is
clockcycle * Avg:CP I = 1ns * ((0.5 * 4) + (0.35 * 5) + (0.15 * 4)) =
4.35ns The avg. instruction execution time on pipelined processor is = 1ns + 0.15ns = 1.15ns So speed up = 4.35 / 1.15 = 3.78
My question:
Where is 0.15 coming from in the average instruction execution time on a pipelines processor? Can anyone explain.
Any help is really appreciated.
As the question says those 0.15ns are due to clock skew and pipeline setup.
Forget about pipeline setup and imagine that all of the 0.15ns are from clock skew.
I think the solution implies the CPI (Cycle Per Instruction) is one (1) (w/o the overhead), i.e., 1-ns clock cycle which I'm assuming it's the CPU running clock (1 GHz).
However, I'm not seeing anywhere the CPI is clearly identified as one (1).
Did I misunderstand anything here?

Operating Systems Virtual Memory

I am a student reading Operating systems course for the first time. I have a doubt in the calculation of the performance degradation calculation while using demand paging. In the Silberschatz book on operating systems, the following lines appear.
"If we take an average page-fault service time of 8 milliseconds and a
memory-access time of 200 nanoseconds, then the effective access time in
nanoseconds is
effective access time = (1 - p) x (200) + p (8 milliseconds)
= (1 - p) x 200 + p x 8.00(1000
= 200 + 7,999,800 x p.
We see, then, that the effective access time is directly proportional to the
page-fault rate. If one access out of 1,000 causes a page fault, the effective
access time is 8.2 microseconds. The computer will be slowed down by a factor
of 40 because of demand paging! "
How did they calculate the slowdown here? Is 'performance degradation' and slowdown the same?
This is whole thing is nonsensical. It assumes a fixed page fault rate P. That is not realistic in itself. That rate is a fraction of memory accesses that result in a page fault.
1-P is the fraction of memory accesses that do not result in a page fault.
T= (1-P) x 200ns + p (8ms) is then the average time of a memory access.
Expanded
T = 200ns + p (8ms - 200ns)
T = 200ns + p (799980ns)
The whole thing is rather silly.
All you really need to know is a nanosecond is 1/billionth of a second.
A microsecond is 1/thousandth of a second.
Using these figures, there is a factor of a million difference between the access time in memory and in disk.

Interrupt time in DMA operation

I'm facing difficulty with the following question :
Consider a disk drive with the following specifications .
16 surfaces, 512 tracks/surface, 512 sectors/track, 1 KB/sector, rotation speed 3000 rpm. The disk is operated in cycle stealing mode whereby whenever 1 byte word is ready it is sent to memory; similarly for writing, the disk interface reads a 4 byte word from the memory in each DMA cycle. Memory Cycle time is 40 ns. The maximum percentage of time that the CPU gets blocked during DMA operation is?
the solution to this question provided on the only site is :
Revolutions Per Min = 3000 RPM
or 3000/60 = 50 RPS
In 1 Round it can read = 512 KB
No. of tracks read per second = (2^19/2^2)*50
= 6553600 ............. (1)
Interrupt = 6553600 takes 0.2621 sec
Percentage Gain = (0.2621/1)*100
= 26 %
I have understood till (1).
Can anybody explain me how has 0.2621 come ? How is the interrupt time calculated? Please help .
Reversing form the numbers you've given, that's 6553600 * 40ns that gives 0.2621 sec.
One quite obvious problem is that the comments in the calculations are somewhat wrong. It's not
Revolutions Per Min = 3000 RPM ~ or 3000/60 = 50 RPS
In 1 Round it can read = 512 KB
No. of tracks read per second = (2^19/2^2)*50 <- WRONG
The numbers are 512K / 4 * 50. So, it's in bytes. How that could be called 'number of tracks'? Reading the full track is 1 full rotation, so the number of tracks readable in 1 second is 50, as there are 50 RPS.
However, the total bytes readable in 1s is then just 512K * 50 since 512K is the amount of data on the track.
But then it is further divided by 4..
So, I guess, the actual comments should be:
Revolutions Per Min = 3000 RPM ~ or 3000/60 = 50 RPS
In 1 Round it can read = 512 KB
Interrupts per second = (2^19/2^2) * 50 = 6553600 (*)
Interrupt triggers one memory op, so then:
total wasted: 6553600 * 40ns = 0.2621 sec.
However, I don't really like how the 'number of interrupts per second' is calculated. I currently don't see/fell/guess how/why it's just Bytes/4.
The only VAGUE explanation of that "divide it by 4" I can think of is:
At each byte written to the controller's memory, an event is triggered. However the DMA controller can read only PACKETS of 4 bytes. So, the hardware DMA controller must WAIT until there are at least 4 bytes ready to be read. Only then the DMA kicks in and halts the bus (or part of) for a duration of one memory cycle needed to copy the data. As bus is frozen, the processor MAY have to wait. It doesn't NEED to, it can be doing its own ops and work on cache, but if it tries touching the memory, it will need to wait until DMA finishes.
However, I don't like a few things in this "explanation". I cannot guarantee you that it is valid. It really depends on what architecture you are analyzing and how the DMA/CPU/BUS are organized.
The only mistake is its not
no. of tracks read
Its actually no. of interrupts occured (no. of times DMA came up with its data, these many times CPU will be blocked)
But again I don't know why 50 has been multiplied,probably because of 1 second, but I wish to solve this without multiplying by 50
My Solution:-
Here, in 1 rotation interface can read 512 KB data. 1 rotation time = 0.02 sec. So, one byte data preparation time = 39.1 nsec ----> for 4B it takes 156.4 nsec. Memory Cycle time = 40ns. So, the % of time the CPU get blocked = 40/(40+156.4) = 0.2036 ~= 20 %. But in the answer booklet options are given as A) 10 B)25 C)40 D)50. Tell me if I'm doing wrong ?