Shift 32 bit Numbers on a 16 bit Datapath - cpu-architecture

How to Shift 32 Bit Numbers on a 16 Bit Datapath
This is a computer architecture question.
My datapath is only 16 bit wide, meaning my ALU can only process 16 bit operands at a time.
My registers are 32 bits wide and are addressable in lower and upper 16 bit portions.
Every time I read the lower half of a register I also read an extra bit telling me weather the upper half contains any 1s at all (ref 1).
So far I implemented the shift left logical. (sll rd, rs1, rs2)
Read the lower half of the rs1 register, and shift it by the amount specified in the lower rs2 register
The bits that I shifted out of these 16 bits, are being stored in a temporary 16 bit register inside the alu
The shifted value will be written back to the lower rd register and the status bit is set (see ref 1)
Now if there is no data written in the higher rs1 register (see ref 1) and the bits that are in the temp alu register are all 0, then my shift operation is done.
Otherwise a second cycle for the upper half is needed
Read the high rs1 register and shift it by the amount stored in the lower rs2 register
But now fill the rs1 with the values stored in the alu temp register (not with 0 as in the first cycle)
Bits shifted out of the 16bit space will be dropped
The result is written back to the higher rd register, and the rd status bit is set (see ref 1)
Example 1: Lets say rs1 is 0x00001234 and rs2 is 0x00000002 (Perform a left shift by 2)
First I read the lower 16bits of rs1 and rs2, presenting 0x1234 and 0x0002. But by reading that I do also get the status bits of both registers, in this case being 0 for rs1 and 0 for rs2 since the upper 16 bits of both registers are all 0. With the data given I can perform a left shift by 2. Resulting in 0x48D0. Since no 1s were shifted out of the sll I can store the result in lower rd and set its status bit to 0. (This is all done in one cycle)
Example 2: Lets say rs1 is 0x0000D234 and rs2 is 0x00000005 (Perform a left shift by 5)
First I read the lower 16bits of rs1 and rs2, presenting 0xD234 and 0x0005. But by reading that I do also get the status bits of both registers, in this case being 0 for rs1 and 0 for rs2 since the upper 16 bits of both registers are all 0. With the data given I can perform a left shift by 5. Resulting in 0x4680. But now I shifted 11010 (0x1A) out of the 16 bit space. This value is stored in the Alu temp register and since it contains 1s I have to perform another cycle.
In the second cycle I read the upper rs1 and lower rs2, presenting 0x0000 and 0x0005. I perform another shift left by 5, but now the Alu temp register is used to fill up the shifted values. 0x0000 -> 0x00__ -> 0x001A. This result is then written back to the upper rd. Therefore finishing my 32 bit sll in two 16 bit cycles.
Example 3: Lets say rs1 is 0x01231234 and rs2 is 0x00000002 (Perform a left shift by 2)
First I read the lower 16bits of rs1 and rs2, presenting 0x1234 and 0x0002. But by reading that I do also get the status bits of both registers, in this case being 1 for rs1 and 0 for rs2 since the upper 16 bits of rs1 are non-zero. Since the status bit of rs1 was non-zero I have to preform a second cycle even if no 1s are shifted out of the lower 16 bits (See Example 2). From now on it follows Example 2, by writing back to rd and preforming a second cycle for the upper bits.
I hope these examples gave a better insight.
Now I want to implement a right shift operation (arithmetical and logical). But how to do that in at most 2 cycles and if I have to read the lower rs1 register first (including the status bit)?
Thanks for reading; this is my first question here, so please don't go to harsh on me :D

Start by reading the higher part.
The first cycle you will read the high part. Do the right shift, the shifted out bits will be present in the high part of the tmp register and the result of the shift will be written back.
The second cycle , you read the low part , you do a shift and an or with the result present in the tmp register. then the result of this part will be written back.

Related

How to understand the physical address in this example?

The image is relating to an example of translating in virtual memory. The address of phys. mem. starts from 0x000 ~ 0x0FC, then moves start 0x100 ~ 0x1FC and so on. Why don't it go like 0x000 ~ 0x0FF, and then 0x100 ~ 0x1FF etc. What are the two lowest bits stand for?
Thank you for your answers. This photo came from MIT open course, and they didn't reveal more details about the address. But I finally figured it out in the later example of the courses.
The two lowest bits can always be zero as the following example:
Supports that we have:
4GB of MM size.
64 lines of cache.
ONLY 1 WORD = 4 bytes PER CACHE LINE.
The address have 32 bits because of 4GB of MM.
The partial address defining the line have 6 bits because of 64 lines of cache.
And because the cache size is 2^6*4B
=> The tag have 24 bits (log2(4GB/2^8B))
=> The lowest bits have 2(32 - 24 - 6) bits.
Because there is only a word per block so that the lowest bits, which act as a data boundary(This is what the course said), are always 0.

Reverse CAN BTR value from register value of stm32

i'm trying to get the baud rate of a chip by reverse engineering it.
the register value for BTR is reading: 0x23000B
As per http://www.bittiming.can-wiki.info/ it seems that real values are "-1" in the register. So it seems that
SJW -> 0x0 -> becomes 1
TS2 -> 0x2 -> becomes 3
TS1 -> 0x3 -> becomes 4
preampl -> 0xB -> 11d -> becomes 12d
so if my decoding is correct (can't really find a reference of what the register should contain officially in any docs):
The chip in question has a 48MHz clock
So 48Mhz/(preampl) => 48MHz/12 => 4Mhz
4.000.000 / (SJW + TS1 + TS2) => 500kbps
does this make any sense? also if you can find reference to the register value in a pdf i would greatly appreciate that.
Besides the calculation i'm not sure about the 48Mhz clock.
A CAN bit is divided into time quanta (tq). The tq are clocked with your CAN prescaler clock which needs to be accurate enough (<1% inaccuracy). When setting up baudrate, you should strive to place the sample point close to 87.5% of the bit length, which comes from an industry standard (CANopen).
(In case you a reverse-engineering something, they did not necessarily follow industry standards though and the sample point could be anywhere...)
Ideally 87.5% sample point is achieved by having a total of 16 tq, 14 tq before the sample point and 2 tq behind it. The desired baudrate is then obtained by:
1 tq fixed sync segment (can't be configured)
x tq propagation segment
y tq phase segment 1 (before sample point)
2 tq phase segment 2 (after sample point)
Different CAN controllers might name propagation segment + phase segment 1 as a single "propagation segment". It doesn't matter, it's the number of tq between the sync segment and the sample point that matters. One ideal example would be:
1 tq sync + 13 tq prop seg/phase seg 1 + 2 tq phase seg 2.
For a CAN clock of 4MHz this would give a bit rate of 4*10^6 / 16 = 250kbps.
Note that some CAN controllers do indeed expect you to subtract 1 tq from each segment length when you write to the register.
SJW, (re)synchronization jump width doesn't play a part in the baudrate calculation. It is a setting which allows a receiving node some room to re-sync in case of inaccurate baudrates. A "hard sync" is performed at the sync segment (bit edge) and then a re-synch is performed at the sample point. SJW allows some inaccuracies to happen here. It is typically just set to 1 and that works fine for all common baudrates. If you go up to 1MHz, it is recommended to increase SJW some, to 2 or 3.

How to calculate which virtual logical address corresponds to physical address?

Assume that the page table for the process currently running on the processor looks as shown in the figure below. All numbers are decimal, all numbers starting with 0 and all addresses are memory syllable addresses. The page size is 1024 bytes.
Which physical address (if any) does each of the following logical (virtual) addresses correspond to? Indicates if a page error occurs while translating the title.
Which physical address (if any) does each of the following logical (virtual) addresses correspond to? Indicates if a page error occurs while translating the title.
a) 1085
b) 2321
c) 5409
number of pages
valid/invalid bit
number of frames
0
1
4
1
1
7
2
0
-
3
1
2
4
0
-
5
1
0
I don't want the solution for this problem, I want someone to explain how this kind of problems are solved.
I think you can guess most configuration from the question. I'll take a) as an example. Maybe you can tell me if I get the answer right and then you can solve the rest by yourself?
The first step is to determine what is the part of the virtual address representing the offset in the page table, and the part representing the offset in the physical frame. For address 1085 and page size of 1024 bytes, you need 10 bits for the offset in the physical frame and the rest for the offset in the page table.
1085 decimal = 0x43D = 0b100 0011 1101
The ten least significant bits (to the right) are the offset in the physical frame. That is 0b00 0011 1101 = 0x3D = 61 decimal. So now you know that the offset in the physical frame will be 61 bytes.
To calculate in what page this offset will be, you take the leftover bits (to the left). That is 0b1 = 0x1 = 1 decimal. This references page table entry 1. Page table entry 1 has the valid bit set. It means that the page is present in memory and will not cause a page fault. The page table entry points to frame number 7. There are 7 frames before frame 7: frames 0, 1, 2, 3, 4, 5, and 6. Thus this virtual address should translate to 7 * 1024 + 61 = 7229.

MIPS R4000 Latency and Initiation Intervals

Why MIPS R4000 has latency of 112 cycles and initiation interval of 111 cycles for square root functional unit ?
The MIPS R4000 Microprocessor User’s Manual provides a somewhat detailed description of the R4000 floating point pipeline (see section 6.7). For floating point operations, the R4000 FPU provides eight operation stages (mantissa add, divide pipeline, exception test, first multiplier, second multiplier, rounding, operand shift, unpack FP numbers). Double precision square root uses the unpack FP numbers for the first cycle, the exception test for the second, both mantissa add and rounding for the next 108 cycles, mantissa add for the next cycle, and rounding for the last cycle.
Since the unpack FP numbers and exception test (the first two cycles) are not used in later cycles, a following square root operation can start two cycles earlier than if square root was completely unpipelined. This can be diagrammed as follows:
1 2 3 4 ... 110 111 112 114 115
SQRT.D U E A+R A+R A+R A R
SQRT.D 110 stall cycles for second SQRT.D
U E A+R A+R
(You can see that the initiation interval counts the cycle when the first SQRT.D is issued, i.e., an initiation interval of zero would mean parallel issue and an initiation interval of one would support back-to-back issue.)

Shifting the Sign extended constant in MIPS

Why do we shift by 2 the sign extended 16bit constant in branching instruction in MIPS? I am confused with this idea. What good does this shifting brings to the sign extended 16 bit constant. Here is the picture:
Regards
MIPS instructions are 32 bits = 4 bytes, so the branch offset is specified as a multiple of 4, i.e. a branch offset of 1 = 4 bytes. This enables a much larger range of branch offsets than if the offset were specified in bytes (as there would then be two redundant bits). Shifting left by 2 is the same as multiplying by 4, of course.
Every binary that is shifted two times to the left is multiple of 4. So by Shifting the immediate two times to the left and adding it to the next instruction address the next instruction address would be obtained.