shifting registers to right in the mic-1 without fetches - cpu-architecture

this is the first time I post a question here so feel free to give me some feedback were something not described in the proper manner. To the actual question:
I was wondering if there was a way to shift a word in one of the registers TO THE RIGHT by 2 bytes without fetches or the arithmetical shifter (EX with fetch: just write the word to memory address 0x0 and fetch 0x0 -> << 8 copy it back in OPC or whatever then OR to the desired register, fetch address 0x1 and OR again to the register without shifting to left this time).
So a register containing 0xcccc1111 should become 0x0000cccc
Here is a small description of the micro-architecture of the mic-1.
Thanks for the help
The purpose is to copy a word that starts from memory 0x2 into LV in a more efficient fashion: this should work but it uses both fetch and write and it's probably complete trash :(
MAR=0; rd;
PC=PC+H; fetch;
MDR= MDR <<8; rd;
LV=MDR<<8; rd;
H=MBRU << 8; fetch;


QSPI connection on STM32 microcontrollers with other peripherals instead of Flash memories

I will start a project which needs a QSPI protocol. The component I will use is a 16-bit ADC which supports QSPI with all combinations of clock phase and polarity. Unfortunately, I couldn't find a source on the internet that points to QSPI on STM32, which works with other components rather than Flash memories. Now, my question: Can I use STM32's QSPI protocol to communicate with other devices that support QSPI? Or is it just configured to be used for memories?
The ADC component I want to use is: ADS9224R (16-bit, 3MSPS)
Here is the image of the datasheet that illustrates this device supports the full QSPI protocol.
Many thanks
page 33 of the datasheet
The STM32 QSPI can work in several modes. The Memory Mapped mode is specifically designed for memories. The Indirect mode however can be used for any peripheral. In this mode you can specify the format of the commands that are exchanged: presence of an instruction, of an adress, of data, etc...
See register QUADSPI_CCR.
QUADSPI supports indirect mode, where for each data transaction you manually specify command, number of bytes in address part, number of data bytes, number of lines used for each part of the communication and so on. Don't know whether HAL supports all of that, it would probably be more efficient to work directly with QUADSPI registers - there are simply too many levers and controls you need to set up, and if the library is missing something, things may not work as you want, and QUADSPI is pretty unpleasant to debug. Luckily, after initial setup, you probably won't need to change very much in its settings.
In fact, some time ago, when I was learning QUADSPI, I wrote my own indirect read/write for QUADSPI flash. Purely a demo program for myself. With a bit of tweaking it shouldn't be hard to adapt it. From my personal experience, QUADSPI is a little hard at first, I spent a pair of weeks debugging it with logic analyzer until I got it to work. Or maybe it was due to my general inexperience.
Below you can find one of my functions, which can be used after initial setup of QUADSPI. Other communication functions are around the same length. You only need to set some settings in a few registers. Be careful with the order of your register manipulations - there is no "start communication" flag/bit/command. Communication starts automatically when you set some parameters in specific registers. This is explicitly stated in the reference manual, QUADSPI section, which was the only documentation I used to write my code. There is surprisingly limited information on QUADSPI available on the Internet, even less with registers.
Here is a piece from my basic example code on registers:
void QSPI_readMemoryBytesQuad(uint32_t address, uint32_t length, uint8_t destination[]) {
while (QUADSPI->SR & QUADSPI_SR_BUSY); //Make sure no operation is going on
QUADSPI->DLR = length - 1U; //Set number of bytes to read
QUADSPI->CR = (QUADSPI->CR & ~(QUADSPI_CR_FTHRES)) | (0x00 << QUADSPI_CR_FTHRES_Pos); //Set FIFO threshold to 1
* Set communication configuration register
* Functional mode: Indirect read
* Data mode: 4 Lines
* Instruction mode: 4 Lines
* Address mode: 4 Lines
* Address size: 24 Bits
* Dummy cycles: 6 Cycles
* Instruction: Quad Output Fast Read
* Set 24-bit Address
(0x06 << QUADSPI_CCR_DCYC_Pos) |
QUADSPI->AR = (0xFFFFFF) & address;
/* ---------- Communication Starts Automatically ----------*/
*destination = *((uint8_t*) &(QUADSPI->DR)); //Read a byte from data register, byte access
It is a little crude, but it may be a good starting point for you, and it's well-tested and definitely works. You can find all my functions here (GitHub). Combine it with reading the QUADSPI section of the reference manual, and you should start to get a grasp of how to make it work.
Your job will be to determine what kind of commands and in what format you need to send to your QSPI slave device. That information is available in the device's datasheet. Make sure you send command and address and every other part on the correct number of QUADSPI lines. For example, sometimes you need to have command on 1 line and data on all 4, all in the same transaction. Make sure you set dummy cycles, if they are required for some operation. Pay special attention at how you read data that you receive via QUADSPI. You can read it in 32-bit words at once (if incoming data is a whole number of 32-bit words). In my case - in the function provided here - I read it by individual bytes, hence such a scary looking *destination = *((uint8_t*) &(QUADSPI->DR));, where I take an address of the data register, cast it to pointer to uint8_t and dereference it. Otherwise, if you read DR just as QUADSPI->DR, your MCU reads 32-bit word for every byte that arrives, and QUADSPI goes crazy and hangs and shows various errors and triggers FIFO threshold flags and stuff. Just be mindful of how you read that register.

MPU-6050 Burst Read Auto Increment

I'm trying to write a driver for the MPU-6050 and I'm stuck on how to proceed regarding reading the raw accelerometer/gyroscope/temperature readings. For instance, the MPU-6050 has the accelerometer X readings in 2 registers: ACCEL_XOUT[15:8] at address 0x3B and ACCEL_XOUT[7:0] at address 0x3C. Of course to read the raw value I need to read both registers and put them together.
In the description of the registers (in the register map and description sheet, it says that to guarantee readings from the same sampling instant I must use burst reads b/c as soon as an idle I2C bus is detected, the sensor registers are refreshed with new data from a new sampling instant. The datasheet snippet shows the simple I2C burst read:
However, this approach (to the best of my understanding) would only work reading the ACCEL_X registers from the same sampling instant if the auto-increment was supported (such that the first DATA in the above sequence would be from ACCEL_XOUT[15:8] # address 0x3B and the second DATA would be from ACCEL_XOUT[7:0] # address 0x3C). But the datasheet ( only mentions that I2C burst writes support the auto-increment feature. Without auto-increment on the I2C read side how would I go about reading two different registers whilst maintaining the same sampling instant?
I also recognize that I could use the sensor's FIFO feature or the interrupt to accomplish what I'm after, but (for my own curiosity) I would like a solution that didn't rely on either.
I also have the same problem, looks like the documentation on this topic is incomplete.
Reading single sample
I think you can burst read the ACCEL_*OUT_*, TEMP_OUT_* and GYRO_*OUT_*. In fact I tried reading the data one register at once, but I got frequent data corruption.
Then, just to try, I requested 6 bytes from ACCEL_XOUT_H, 6 bytes from GYRO_XOUT_H and 2 bytes from TEMP_OUT_H and... it worked! No more data corruption!
I think they simply forgot to mention this in the register map.
How to
Here is some example code that can work in the Arduino environment.
These are the function that I use, they are not very safe, but it works for my project:
inline void requestBytes(byte SUB, byte nVals)
Wire.requestFrom(SAD, nVals);
while (Wire.available() == 0);
inline byte getByte(void)
inline void stopRead(void)
byte readByte(byte SUB)
requestBytes(SUB, 1);
byte result = getByte();
return result;
void readBytes(byte SUB, byte* buff, byte count)
requestBytes(SUB, count);
for (int i = 0; i < count; i++)
buff[i] = getByte();
At this point, you can simply read the values in this way:
// burst read the registers using auto-increment:
byte data[6];
readBytes(ACCEL_XOUT_H, data, 6);
// convert the data:
acc_x = (data[0] << 8) | data[1];
// ...
Looks like this cannot be done for other registers. For example, to read the FIFO_COUNT_* I have to do this (otherwise I get incorrect results):
uint16_t FIFO_size(void)
byte bytes[2];
// this does not work
//readBytes(FIFO_COUNT_H, bytes, 2);
bytes[1] = readByte(FIFO_COUNT_H);
bytes[2] = readByte(FIFO_COUNT_L);
return unisci_bytes(bytes[1], bytes[2]);
Reading the FIFO
Looks like the FIFO works differently: you can burst read by simply requesting multiple bytes from the FIFO_R_W register and the MPU6050 will give you the bytes in the FIFO without incrementing the register.
I found this example where they use I2Cdev::readByte(SAD, FIFO_R_W, buffer) to read a given number of bytes from the FIFO and if you look at I2Cdev::readByte() (here) it simply requests N bytes from the FIFO register:
// ... send FIFO_R_W and request N bytes ...
for(...; ...; count++)
data[count] = Wire.receive();
// ...
How to
This is simple since the FIFO_R_W does not auto-increment:
byte data[12];
void loop() {
// ...
readBytes(FIFO_R_W, data, 12); // <- replace 12 with your burst size
// ...
Using FIFO_size() is very slow!
Also my advice is to use 400kHz I2C frequency, which is the MPU6050's maximum speed
Hope it helps ;)
As Luca says, the burst read semantic seems to be different depending on the register the read operation starts at.
Reading consistent samples
To read a consistent set of raw data values, you can use the method I2C.readRegister(int, ByteBuffer, int) with register number 59 (ACCEL_XOUTR[15:8]) and a length of 14 to read all the sensor data ACCEL, TEMP, and GYRO in one operation and get consistent data.
Burst read of FIFO data
However, if you use the FIFO buffer of the chip, you can start the burst read with the same method signature on register 116 (FIFO_R_W) to read the given amount of data from the chip-internal fifo buffer. Doing so you must keep in mind that there is a limit on the number of bytes that can be read in one burst operation. If I'm interpreting right, a maximum of 31 bytes can be read in a single burst operation.

Very few write cycles in stm32f4

I'm using a STM32F401VCT6U "discovery" board, and I need to provide a way for the user to write addresses in memory at runtime.
I wrote what can be simplified to the following function:
uint8_t Write(uint32_t address, uint8_t* values, uint8_t count)
uint8_t index;
for (index = 0; index < count; ++index) {
if (IS_FLASH_ADDRESS(address+index)) {
/* flash write */
if (FLASH_ProgramByte(address+index, values[index]) != FLASH_COMPLETE) {
} else {
/* ram write */
((uint8_t*)address)[index] = values[index]
return NO_ERROR;
In the above, address is the base address, values is a buffer of size at least count which contains the bytes to write to memory and count the number of bytes to write.
Now, my problem is the following: when the above function is called with a base address in flash and count=100, it works normally the first few times, writing the passed values buffer to flash. After those first few calls however, I cannot write just any value anymore: I can only reset bits in the values in flash, eg an attempt to write 0xFF to 0x7F will leave 0x7F in the flash, while writing 0xFE to 0x7F will leave 0x7E, and 0x00 to any value will be successful (but no other value will be writable to the address afterwards).
I can still write normally to other addresses in the flash by changing the base address, but again only a few times (two or three calls with count=100).
This behaviour suggests that the maximum write count of the flash has been reached, but I cannot imagine it can be so fast. I'd expect at the very least 10,000 writes before exhaustion.
So what am I doing wrong?
You have missunderstood how flash works - it is not for example as straight forward as writing EEPROM. The behaviour you are discribing is normal for flash.
To repeatidly write the same address of flash the whole sector must be first erased using FLASH_EraseSector. Generally any data that needs to preserved during this erase needs to be either buffered in RAM or in another flash sector.
If you are repeatidly writing a small block of data and are worried about flash burnout do to many erase write cycles you would want to write an interface to the flash where each write you move your data along the flash sector to unwriten flash, keeping track of its current offset from the start of sector. Only then when you run out of bytes in the sector would you need to erase and start again at start of sector.
ST's "right way" is detailed in AN3969: EEPROM emulation in STM32F40x/STM32F41x microcontrollers
This is more or less the process:
Reserve two Flash pages
Write the latest data to the next available location along with its 'EEPROM address'
When you run out of room on the first page, write all of the latest values to the second page and erase the first
Begin writing values where you left off on page 2
When you run out of room on page 2, repeat on page 1
This is insane, but I didn't come up with it.
I have a working and tested solution, but it is rather different from #Ricibob's answer, so I decided to make this an answer.
Since my user can write anywhere in select flash sector, my application cannot handle the responsability of erasing the sector when needed while buffering to RAM only the data that need to be preserved.
As a result, I transferred to my user the responsability of erasing the sector when a write to it doesn't work (this way, the user remains free to use another address in the sector to avoid too many write-erase cycles).
Basically, I expose a write(uint32_t startAddress, uint8_t count, uint8_t* values) function that has a WRITE_SUCCESSFUL return code and a CANNOT_WRITE_FLASH in case of failure.
I also provide my user with a getSector(uint32_t address) function that returns the id, start address and end address of the sector corresponding to the address passed as a parameter. This way, the user knows what range of address is affected by the erase operation.
Lastly, I expose an eraseSector(uint8_t sectorID) function that erase the flash sector whose id has been passed as a parameter.
Erase Policy
The policy for a failed write is different from #Ricibob's suggestion of "erase if the value in flash is different of FF", as it is documented in the Flash programming manual that a write will succeed as long as it is only bitreset (which matches the behavior I observed in the question):
Note: Successive write operations are possible without the need of an erase operation when
changing bits from ‘1’ to ‘0’.
Writing ‘1’ requires a Flash memory erase operation.
If an erase and a program operation are requested simultaneously, the erase operation is
performed first.
So I use the macro CAN_WRITE(a,b), where a is the original value in flash and b the desired value. The macro is defined as:
!(~a & b)
which works because:
the logical not (!) will transform 0 to true and everything else to false, so ~a & b must equal 0 for the macro to be true;
any bit at 1 in a is at 0 in ~a, so it will be 0 whatever its value in b is (you can transform a 1 in 1 or 0);
if a bit is 0 in a, then it is 1 in ~a, if b equals 1 then ~a & b != 0 and we cannot write, if bequals 0 it's OK (you can transform a 0 to 0 only, not to 1).
List of flash sector in STM32F4
Lastly and for future reference (as it is not that easy to find), the list of sectors of flash in STM32 can be found on page 7 of the Flash programming manual.

Base + Offset Addressing Mode

I just need an explanation of how base + offset addressing modes work. Having trouble finding a clear-cut answer for this. (I've been working with the LC-3, not sure if that matters). A simple example would also be helpful.
Thank you!
EBP is the base here (holds the base address) like "00402000" for instance
so the EAX will be loaded with the value in the address [00402000+8] I.e 00402004
Base + Index addressing mode
2 Registers specify the address of an operand in an instruction.
Add the numerical values stored in those registers to get the complete address of an operand.
A = 1000
Register A = 1000
Register B = 8
MOV C, [A,B] => C = contents of location A+B
There is a flavor to Base + Indexing addressing called the base + Index + displacement
Displacement = immediate value in the instruction that is added to the Base + Index.
thats what you see in your opcode.
instruction = OPCODE + Operand 1 Register Spec + Operand 2's Base Register Spec + Operand 2's Index Register Spec + Immediate value.
imagine a microprocessor with a 8 bit register space.
so a 16 bit operand may have
4 bit for opcode
3 bit for base register
3 bit for index register
6 bit for immediate displacement.
I believe I have figured out the answer. I'll post it here in case it helps anyone else who has trouble with this. I found the answer hidden deep in the 100 slide powerpoint that my teacher provided XD
This is what happens when performing an LDR using a base register R6 and destination register R2:
MAR<-R6 + IR[5:0]
Lets say R6 = x3000, IR[5:0] = x5, and R2 = 0 (although this value doesn't matter since it will be loaded with another value at the end)
MAR<-R6 + IR[5:0]
R6 is added to IR[5:0] (which is the offset value in the last six bits of the LDR instruction). The base x3000 (value of R6) has x5 (value of IR[5:0]) added to it, giving us x3005. The MAR (memory address register) now holds x3005.
The value in the MAR (x3005) is loaded into the MDR (memory data register).
The value in the MDR (x3005) is loaded into R2. R2 now holds the value: x3005.
I hope this question helps those new to addressing modes like I am :)
Thank you all.

Cortex-A5 unaligned access exception

I have some question for Cortex-A5 unaligned access exception
Basic System information blow
I and D cache enabled.
Disabled MMU.
Firmware base
In developing the DMA driver code I wrote the following C code.
UINT32 DMA_InstMOV(UINT8 *buf, tENC_RD rd, UINT32 val)
buf[0] = CMD_DMAMOV;
buf[1] = rd;
*((UINT32 *)&buf[2]) = val; // this line is exception occur
Dissamber the code above to check them as follows
0x00000bf8: e1a03000 .0.. MOV r3,r0
0x00000bfc: e3a000bc .... MOV r0,#0xbc
0x00000c00: e5c30000 .... STRB r0,[r3,#0]
0x00000c04: e5c31001 .... STRB r1,[r3,#1]
0x00000c08: e5832002 . .. STR r2,[r3,#2]
0x00000c0c: e3a00006 .... MOV r0,#6
0x00000c10: e12fff1e ../. BX lr
R3 Value is 0x08040000
STR instruction is executed with unaligned address Exception(Data Abort) occurs.
Cortex A5 is not support unaligned access?
In DDI0406C_b_arm_architecture_reference_manual.pdf(Table A3-1 Alignment requirements of load/store instructions)
LDM, STR is not support unaligned access.
So Data Exception occurs.
But I still have some question
This drivers code is working good in Cortex-R4 core. It didn’t have any problem.
Disassebly code is same.
This is even more confusing
Many linux drivers also use the above code.
If the MMU is turned on, which would solve this problem?
Let’s me know what’s worng for me?
You need to enable MMU for making an unaligned access in cortex A5. Also make sure the bitfield "A" in SCTLR register is set to 0 to ensure that strict alignment fault is disabled
As I see it, you can't do what you are trying if buf is 32-bit aligned.
The compiler doesn't know how buf is aligned, so it assumes you know what you are doing. If you ask for a 32-bit write (by doing something like *((UINT32 *)&buf[2])) then the compilers assumes it is a valid thing to do. It therefore generates a STR instruction, which is only valid for aligned stores. Hence the fault - as buf is 32-bit aligned (as you state), buf[2] is clearly not.
I have no idea why the Cortex-R4 experience should be different, as far as I can tell it operates with the same instruction set and alignment rules as the A5 (but I could be wrong). Maybe you got lucky and your bufs were unaligned such that buf[2] was 32-bit aligned.