Interrupt time in DMA operation - cpu-architecture

I'm facing difficulty with the following question :
Consider a disk drive with the following specifications .
16 surfaces, 512 tracks/surface, 512 sectors/track, 1 KB/sector, rotation speed 3000 rpm. The disk is operated in cycle stealing mode whereby whenever 1 byte word is ready it is sent to memory; similarly for writing, the disk interface reads a 4 byte word from the memory in each DMA cycle. Memory Cycle time is 40 ns. The maximum percentage of time that the CPU gets blocked during DMA operation is?
the solution to this question provided on the only site is :
Revolutions Per Min = 3000 RPM
or 3000/60 = 50 RPS
In 1 Round it can read = 512 KB
No. of tracks read per second = (2^19/2^2)*50
= 6553600 ............. (1)
Interrupt = 6553600 takes 0.2621 sec
Percentage Gain = (0.2621/1)*100
= 26 %
I have understood till (1).
Can anybody explain me how has 0.2621 come ? How is the interrupt time calculated? Please help .

Reversing form the numbers you've given, that's 6553600 * 40ns that gives 0.2621 sec.
One quite obvious problem is that the comments in the calculations are somewhat wrong. It's not
Revolutions Per Min = 3000 RPM ~ or 3000/60 = 50 RPS
In 1 Round it can read = 512 KB
No. of tracks read per second = (2^19/2^2)*50 <- WRONG
The numbers are 512K / 4 * 50. So, it's in bytes. How that could be called 'number of tracks'? Reading the full track is 1 full rotation, so the number of tracks readable in 1 second is 50, as there are 50 RPS.
However, the total bytes readable in 1s is then just 512K * 50 since 512K is the amount of data on the track.
But then it is further divided by 4..
So, I guess, the actual comments should be:
Revolutions Per Min = 3000 RPM ~ or 3000/60 = 50 RPS
In 1 Round it can read = 512 KB
Interrupts per second = (2^19/2^2) * 50 = 6553600 (*)
Interrupt triggers one memory op, so then:
total wasted: 6553600 * 40ns = 0.2621 sec.
However, I don't really like how the 'number of interrupts per second' is calculated. I currently don't see/fell/guess how/why it's just Bytes/4.
The only VAGUE explanation of that "divide it by 4" I can think of is:
At each byte written to the controller's memory, an event is triggered. However the DMA controller can read only PACKETS of 4 bytes. So, the hardware DMA controller must WAIT until there are at least 4 bytes ready to be read. Only then the DMA kicks in and halts the bus (or part of) for a duration of one memory cycle needed to copy the data. As bus is frozen, the processor MAY have to wait. It doesn't NEED to, it can be doing its own ops and work on cache, but if it tries touching the memory, it will need to wait until DMA finishes.
However, I don't like a few things in this "explanation". I cannot guarantee you that it is valid. It really depends on what architecture you are analyzing and how the DMA/CPU/BUS are organized.

The only mistake is its not
no. of tracks read
Its actually no. of interrupts occured (no. of times DMA came up with its data, these many times CPU will be blocked)
But again I don't know why 50 has been multiplied,probably because of 1 second, but I wish to solve this without multiplying by 50

My Solution:-
Here, in 1 rotation interface can read 512 KB data. 1 rotation time = 0.02 sec. So, one byte data preparation time = 39.1 nsec ----> for 4B it takes 156.4 nsec. Memory Cycle time = 40ns. So, the % of time the CPU get blocked = 40/(40+156.4) = 0.2036 ~= 20 %. But in the answer booklet options are given as A) 10 B)25 C)40 D)50. Tell me if I'm doing wrong ?

Related

ADXL345 Accelerometer data use on I2C (Beaglebone Black)

Background Information
I am trying to make sure I will be able to run two ADXL345 Accelerometers on the same I2C Bus.
To my understanding, the bus can transmit up to 400k bits/s on fast mode.
In order to send 1 byte of data, there are 20 extra bits of overhead.
There are 6 bytes per accelerometer reading (XLow, XHigh, YLow, YHigh, ZLow, ZHigh)
I need to do 1000 readings per second with both accelerometers
Thus,
My total data used per second is 336k bits/s which is within my limit of 400k bits/s.
I am not sure if I am doing these calculations correctly.
Question:
How much data am I transmitting per second with two accelerometers reading 1000 times per second on i2c?
Your math seems to be a bit off; for this accelerometer (from the datasheet: https://www.sparkfun.com/datasheets/Sensors/Accelerometer/ADXL345.pdf), in order to read the 6 bytes of XYZ sample data, you need to perform a 6-byte burst read of the registers. What this means in terms of data transfer is a write of the register address to the accelerometer (0x31) then a burst read of 6 bytes continuously. Each of these two transfers requires sending first the I2C device address and the R/W bit, as well as an ACK/NAK per byte, including the address bytes, as well as START/REPEAT START/STOP conditions. So, over all, an individual transfer to get a single sample (ie, a single XYZ acceleration vector) is as follows:
Start (*) | Device Address: 0x1D (7) | Write: 0 (1) | ACK (1) | Register Address: 0x31 (8) | ACK (1) | Repeat Start (*) | Device Address: 0x1D (7) | Read: 1 (1) | ACK (1) | DATA0 (8) | ACK(1) | DATA1 (8) | ACK (1) | ... | DATA5 (8) | NAK (1) | Stop (*)
If we add all that up, we get 81+3 bits of data that need to be transmitted. Note first that the START, REPEAT START and STOP might not actually take a bits worth of time each but for simplicity we can assume they do. Note also that while the device address is only 7 bits, you always need to postpend the READ/WRITE bit, so an I2C transaction is always 8 bits + ACK/NAK, so 9 bits in total. Note also, the I2C max transfer rate really defines the max SCK speed the device can handle, so in fast mode, the SCK is at most 400KHz (thus 400Kbps at most, but because of the protocol, you'll get less in real data). Thus, 84 bits at 400KHz means that we can transfer a sample in 0.21 ms or ~4700 samples/sec assuming no gaps or breaks in transmission.
Since you need to read 2 samples every 1ms (2 accelerometers, so 84 bits * 2 = 164 bits/sample or 164Kbps at 1KHz sampling rate), this should at least be possible for fast mode I2C. However, you will need to be careful that you are taking full use of the I2C controller. Depending on the software layer you are working on, it might be difficult to issue I2C burst reads fast enough (ie, 2 burst read transactions within 1ms). Using the FIFO on the accelerometer would significantly help the latency requirement, meaning instead of having 1ms to issue two burst reads, you can delay up to 32ms to issue 64 burst reads (since you have 2 accelerometers); but since you need to issue a new burst read to read the next sample, you'll have to be careful about the delay introduced by software between calls to whatever API youre using to perform the I2C transactions.

Loading a program from the disk

How long does it take to load a 64-KB program from a disk whose average seek time is 10 msec., whose rotation time is 20 msecs., and whose track holds 32-KB for a 2-KB page size?
The pages are spread randomly around the disk and the number of cylinders is so large
that the chance of two pages being on the same cylinder is negligible.
My solution ..
64 KB program will be organized into 2 tracks because of each track capacity is 32KB.
To load entire track we require 20msec. To load 2KB we require 1.25 msec.
I/O time =seek time+avg.rotation latency+transfer time
10msec+10msec+1.25msec=21.25msec
Since 64KB program is organized into 2 tracks then I/O time will be 2(21.25)=42.5 msec.
Is it correct? if so why seek time =avg rotetion latency?
As its mentioned that: the number of cylinders is so large that the chance of two pages being on the same cylinder is negligible., it simply means to load each page,we always need to move to a different cylinder. Therefore, we have to add seek timefor each page, because seek time is time it takes to move the arm over appropriate cylinder.
Number of pages = 64KB/2KB = 32
rotation time by definition is time it takes to get to the appropriate sector, once we have reached the appropriate cylinder.
So Time taken would be = 32*10*20 = 6400 msec
I found another solution .
The seek plus rotational latency is 20 msec.
For 2-KB pages, the transfer
time is 1.25 msec, for a total of 21.25 msec.
Loading 32 of these pages will
take 680 msec. For 4-KB pages, the transfer time is doubled to 2.5 msec, so
the total time per page is 22.50 msec. Loading 16 of these pages takes 360
msec.
Now I'm confused.

Average memory access time

I would like to know did I solve the equation correctly below
find the average memory access time for process with a process with a 3ns clock cycle time, a miss penalty of 40 clock cycle, a miss rate of .08 misses per instruction, and a cache access time of 1 clock cycle
AMAT = Hit Time + Miss Rate * Miss Penalty
Hit Time = 3ns, Miss Penalty = 40ns, Miss Rate = 0.08
AMAT = 3 + 0.08 * 40 = 6.2ns
Check the "Miss Penalty". Be more careful to avoid trivial mistakes.
The question that you tried to answer cannot actually be answered, since you are given 0.08 misses per instruction but you don't know the average number of memory accesses per instruction. In an extreme case, if only 8 percent of instructions accessed memory, then every memory access would be a miss.

Increasing/Altering Matlab-Arduino analogRead() sampling rate

I have been controlling Arduino from Matlab using ArduinoIO-Matlab interface. My current setup is I have 3 EMG Muscle Sensors (from Advancer Technologies) are connected to the Arduino at analog pin 1,2, and 3. Arduino is connected to Matlab. I am trying to collect data from these three pins simultaneously and store them in an matrix size 1000x3. My issue is the rate at which Matlab is sampling from the analog pin. It takes about 25 seconds to collect 1000 readings from the 3 pins simultaneously. I know arduino itself samples at a higher rate. Below is my code. How do I alter this to get a sampling rate of about like 1000 samples in 10 seconds ?
ar = arduino('COM3');
ax = zeros(1000,3);
for ai = 1:1000
ax(ai,:) = [ar.analogRead(1) ar.analogRead(2) ar.analogRead(3)];
end
delete(ar);
This is the time taken by the above code (profile viewer):
time calls line
< 0.01 1 3 ax = zeros(1000,3);
4
< 0.01 1 5 for ai = 1:1000
25.07 1000 6 ax(ai,:) = [ar.analogRead(1) ar.analogRead(2) ar.analogRead(3)];
1000 7 end
8
1.24 1 9 delete(ar);
Please let me know if there is something else that I need to clarify.
Thanks :Denter code here
You need to modify the arduino c++ code (.pde file).
In this code you should sample the signal as you prefer (1000 for example) and then transfer the sampled data to matlab using serial.writeln() method.
This will give you a sampling rate of ~3KHz (depending on alot of factors)...
The following very probably explains the result that you are seeing and why you need to do something like what Muhammad's answer suggests. While this reason was implied by his answer it was not spelt out so that others can avoid the 'trap'.
I do not have access to the underlying code and systems needed to check this answer with certainty. This answer is based on "typical methods" and has a modest chance of being sheer poppycock [tm], but the exact fit between observation and standard methods suggests this is what is happening. A very little delving by someone with the requisite system to hand will demonstrate if this is correct.
When data is sent one data sample at a time you incur a per-sample overhead significantly in excess of the time taken to just transfer the raw data.
You say it takes 25 seconds to transfer 3000 samples.
The time per sample = 25/3000 = 8.333 ms per sample.
Assume a 9600 baud data transfer rate.
The default communications speed is liable to but 9600 baud. This can be checked but the result suggests that this may be correct and making slightly different assumptions provides an equally good explanation.
Serial coms usually uses N81 format = 1 start bit, 8 data bits, 1 stop bit per 8 bit byte.
So 1 bit takes 1/9600 s
and 10 bits take 10/9600 = 1.042 mS
And sample time / byte time
= 8.333 / 1.042 = 7.997 word times.
In fact if you do the calculations without rounding or truncation, ie
25 / 3000 x 9600/10 = 8.000.... .
ie your transfer is taking EXACTLY 8 x 9600 baud word times per sample.
Equally, this is exactly 4 x 4800 baud or 2 x 2400 baud transfer times.
I have not examined the format used but imagine that to work with the PC monitor program the basic serial routine may use
2 x data bytes + CR + LF = 4 bytes.
That assumes a 16 bit variable sent as 2 x 8 bit binary words.
More likely = either
- 16 bits sent as 4 x ASCII characters or
- 24 bits sent as 6 x ASCII characters.
In the absence of suitably deep delving, the use of 6 ASCII words and a CR + LF at 9600 baud provides such a good fit using typical parameters that Occam probably opines that this is the best starting point. Regardless of whether the total requirement is 8 or 4 or 2 bytes, the somewhat serendipitous exact match between your observed data rate and standard baud rates suggests that this provides the basic reason for what you see.
Looking at the code will rapidly show what baud rate, data length and packing is used.

Main memory bandwidth measurement

I want to measure the main memory bandwidth and while looking for the methodology, I found that,
many used 'bcopy' function to copy bytes from a source to destination and then measure the time which they report as the bandwidth.
Others ways of doing it is to allocate and array and walk through the array (with some stride) - this basically gives the time to read the entire array.
I tried doing (1) for data size of 1GB and the bandwidth I got is '700MB/sec' (I used rdtsc to count the number of cycles elapsed for the copy). But I suspect that this is not correct because my RAM config is as follows:
Speed: 1333 MHz
Bus width: 32bit
As per wikipedia, the theoretical bandwidth is calculated as follows:
clock speed * bus width * # bits per clock cycle per line (2 for ddr 3
ram) 1333 MHz * 32 * 2 ~= 8GB/sec.
So mine is completely different from the estimated bandwidth. Any idea of what am I doing wrong?
=========
Other question is, bcopy involves both read and write. So does it mean that I should divide the calculated bandwidth by two to get only the read or only the write bandwidth? I would like to confirm whether the bandwidth is just the inverse of latency? Please suggest any other ways of measuring the bandwidth.
I can't comment on the effectiveness of bcopy, but the most straightforward approach is the second method you stated (with a stride of 1). Additionally, you are confusing bits with bytes in your memory bandwidth equation. 32 bits = 4bytes. Modern computers use 64 bit wide memory buses. So your effective transfer rate (assuming DDR3 tech)
1333Mhz * 64bit/(8bits/byte) = 10666MB/s (also classified as PC3-10666)
The 1333Mhz already has the 2 transfer/clock factored in.
Check out the wiki page for more info: http://en.wikipedia.org/wiki/DDR3_SDRAM
Regarding your results, try again with the array access. Malloc 1GB and traverse the entire thing. You can sum each element of the array and print it out so your compiler doesn't think it's dead code.
Something like this:
double time;
int size = 1024*1024*1024;
int sum;
*char *array = (char*)malloc(size);
//start timer here
for(int i=0; i < size; i++)
sum += array[i];
//end timer
printf("time taken: %f \tsum is %d\n", time, sum);