How to decrease SPI overhead time for STM32L4 HAL library

How to decrease SPI overhead time for STM32L4 HAL library - accelerometer

I am using a STM32L476RG board and HAL SPI functions:
HAL_SPI_Transmit(&hspi2, &ReadAddr, 1, HAL_MAX_DELAY);
HAL_SPI_Receive(&hspi2, pBuffer, 4, HAL_MAX_DELAY);
I need to receive data from accelerometer's buffer with maximum speed and I have a problem with delay in these functions. As you can see on the oscilloscope screenshots, there are several microseconds during which nothing happens. I have no idea how to minimize the transmission gap.
I tried using HAL_SPI_Receive_DMA function and this delay was even bigger. Do you have any idea how to solve this problem using HAL functions or any pointers on how I could write my SPI function without these delays?

TL;DR Don't use HAL, write your transfer functions using the Reference Manual.
HAL is hopelessly overcomplicated for time-critical tasks (among others). Just look at the HAL_SPI_Transmit() function, it's over 60 lines of code till it gets to actually touching the Data Register. HAL will first mark the port access structure as busy even when there is no multitasking OS in sight, validates the function parameters, stores them in the hspi structure for no apparent reason, then goes on figuring out what mode SPI is in, etc. It's not necessary to check timeouts in SPI master mode either, because master controls all bus timings, if it can't get out a byte in a finite amount of time, then the port initialization is wrong, period.
Without HAL, it's a lot simpler. First, figure out what should go into the control registers, set CR1 and CR2 accordingly.
void SPIx_Init() {
/* full duplex master, 8 bit transfer, default phase and polarity */
SPIx->CR1 = SPI_CR1_MSTR | SPI_CR1_SPE | SPI_CR1_SSM | SPI_CR1_SSI;
/* Disable receive FIFO, it'd complicate things when there is an odd number of bytes to transfer */
SPIx->CR2 = SPI_CR2_FRXTH;
}
This initialization assumes that Slave Select (NSS or CS#) is handled by separate GPIO pins. If you want CS# managed by the SPI peripheral, then look up Slave select (NSS) pin management in the Reference Manual.
Note that a full duplex SPI connection can not just transmit or receive, it always does both simultaneously. If the slave expects one command byte, and answers with four bytes of data, that's a 5-byte transfer, the slave will ignore the last 4 bytes, the master should ignore the first one.
A very simple transfer function would be
void SPIx_Transfer(uint8_t *outp, uint8_t *inp, int count) {
while(count--) {
while(!(SPIx->SR & SPI_SR_TXE))
;
*(volatile uint8_t *)&SPIx->DR = *outp++;
while(!(SPIx->SR & SPI_SR_RXNE))
;
*inp++ = *(volatile uint8_t *)&SPIx->DR;
}
}
It can be further optimized when needed, by making use of the SPI fifo, interleaving writes and reads so that the transmitter is always kept busy.
If speed is critical, don't use generalized functions, or make sure they can be inlined when you do. Use a compiler with link-time optimization enabled, and optimize for speed (quite obviously).

You can use HAL_SPI_TransmitReceive(&hspi2, ReadAddr, pBuffer, 1 + 4, HAL_MAX_DELAY); instead of a HAL_SPI_Transmit and a HAL_SPI_Receive. This will avoid the time between transmit and receive.
You can also try changing compilation settings to optimize the speed.
You can also check the accelerometer's datasheet, may be you can read all the buffer with a single frame, something lie this:
HAL_SPI_TransmitReceive(&hspi2, ReadAddr, pBuffer, 1 + (4 * numOfSamples), HAL_MAX_DELAY);

What worked for me:
Read SPI registers directly
Optimize your function for speed
For example function (code); See solution by “JElli.1” in ST- Community >>
ST Community answer

Related

How fast can I read an ADC of a 2-dimensional array?

I have an application where it must control the ADC reading of the array 32x32 element.
For each element I have to choose point - read ADC - turn off ADC. Currently I am using a scanning method like LED scanning. For each scan I read 1 point. Then store the value in the array and transmit it.
However, I found this to be very slow. I want to use DMA to automate this reading, then all I need to do is pass it on.
Is there a way to do this?

I didn't quite understand your question.
Let's suppose that you want to read data from external signals on the ADC MCU input pin.
I recommend you first, two solutions for ADC Reading that data:
Configure your ADC in continuous mode.
Configure your ADC in single mode, and configure another peripheral (like timer) that will trigger the ADC to start the conversion at each event.
Second, there are two different way to manage your converted data:
After each End Of Conversion Interrupt of the ADC you can store your data manually and do your stuff.
void HAL_ADC_ConvCpltCallback(ADC_HandleTypeDef AdcHandle)
{
/ Get the converted value of regular channel */
uhADCxConvertedValue = HAL_ADC_GetValue(AdcHandle);
}
Configure a DMA Handler that will deal with the ADC converted data and store it in RAM automatically without the CPU intervention.
#define ADC_CONVERTED_DATA_BUFFER_SIZE ((uint32_t) 32) /* Size of array aADCxConvertedData[] */
static uint16_t aADCxConvertedData[ADC_CONVERTED_DATA_BUFFER_SIZE];
HAL_ADC_Start_DMA(&AdcHandle,uint32_t *)aADCxConvertedData,ADC_CONVERTED_DATA_BUFFER_SIZE
You will find a lot of details in the Reference Manual.

stm32 NVIC_EnableIRQ() bare metal equivalent?

I'm using the blue pill, and trying to figure out interrupts. I have an interrupt handler:
void __attribute__ ((interrupt ("TIM4_IRQHandler"))) myhandler()
{
puts("hi");
TIM4->EGR |= TIM_EGR_UG; // send an update even to reset timer and apply settings
TIM4->SR &= ~0x01; // clear UIF
TIM4->DIER |= 0x01; // UIE
}
I set up the timer:
RCC_APB1ENR |= RCC_APB1ENR_TIM4EN;
TIM4->PSC=7999;
TIM4->ARR=1000;
TIM4->EGR |= TIM_EGR_UG; // send an update even to reset timer and apply settings
TIM4->EGR |= (TIM_EGR_TG | TIM_EGR_UG);
TIM4->DIER |= 0x01; // UIE enable interrupt
TIM4->CR1 |= TIM_CR1_CEN;
My timer doesn't seem to activate. I don't think I've actually enabled it though. Have I??
I see in lots of example code commands like:
NVIC_EnableIRQ(USART1_IRQn);
What is actually going in NVIC_EnableIRQ()?
I've googled around, but I can't find actual bare-metal code that's doing something similar to mine.
I seem to be missing a crucial step.
Update 2020-09-23 Thanks to the respondents to this question. The trick is to set the bit for the interrupt number in an NVIC_ISER register. As I pointed out below, this doesn't seem to be mentioned in the STM32F101xx reference manual, so I probably would never have been able to figure this out on my own; not that I have any real skill in reading datasheets.
Anyway, oh joy, I managed to get interrupts working! You can see the code here: https://github.com/blippy/rpi/tree/master/stm32/bare/04-timer-interrupt

Even if you go bare metal, you might still want to use the CMSIS header files that provide declarations and inline version of very basic ARM Cortex elements such NVIC_EnableIRQ.
You can find NVIC_EnableIRQ at https://github.com/ARM-software/CMSIS_5/blob/develop/CMSIS/Core/Include/core_cm3.h#L1508
It's defined as:
#define NVIC_EnableIRQ __NVIC_EnableIRQ
__STATIC_INLINE void __NVIC_EnableIRQ(IRQn_Type IRQn)
{
if ((int32_t)(IRQn) >= 0)
{
__COMPILER_BARRIER();
NVIC->ISER[(((uint32_t)IRQn) >> 5UL)] = (uint32_t)(1UL << (((uint32_t)IRQn) & 0x1FUL));
__COMPILER_BARRIER();
}
}
If you want to, you can ignore __COMPILER_BARRIER(). Previous versions didn't use it.
The definition is applicable to Cortex M-3 chips. It's different for other Cortex versions.

With the libraries is still considered bare metal. Without operating system, but anyway, good that you have a desire to learn at this level. Someone has to write the libraries for others.
I was going to do a full example here, (it really takes very little code to do this), but will take from my code for this board that uses timer1.
You obviously need the ARM documentation (technical reference manual for the cortex-m3 and the architectural reference manual for armv7-m) and the data sheet and reference manual for this st part (no need for programmers manual from either company).
You have provided next to no information related to making the part work. You should never dive right into a interrupt, they are advanced topics and you should poll your way as far as possible before finally enabling the interrupt into the core.
I prefer to get a uart working then use that to watch the timer registers when the roll over, count, etc. Then see/confirm the status register fired, learn/confirm how to clear it (sometimes it is just a clear on read).
Then enable it into the NVIC and by polling see the NVIC sees it, and that you can clear it.
You didn't show your vector table this is key to getting your interrupt handler working. Much less the core booting.
08000000 <_start>:
8000000: 20005000
8000004: 080000b9
8000008: 080000bf
800000c: 080000bf
...
80000a0: 080000bf
80000a4: 080000d1
80000a8: 080000bf
...
080000b8 <reset>:
80000b8: f000 f818 bl 80000ec <notmain>
80000bc: e7ff b.n 80000be <hang>
...
080000be <hang>:
80000be: e7fe b.n 80000be <hang>
...
080000d0 <tim1_handler>:
The first word loads the stack pointer, the rest are vectors, the address to the handler orred with one (I'll let you look that up).
In this case the st reference manual shows that interrupt 25 is TIM1_UP at address 0x000000A4. Which mirrors to 0x080000A4, and that is where the handler is in my binary, if yours is not then two things, one you can use VTOR to find an aligned space, sometimes sram or some other flash space that you build for this and point there, but your vector table handler must have the proper pointer or your interrupt handler won't run.
volatile unsigned int counter;
void tim1_handler ( void )
{
counter++;
PUT32(TIM1_SR,0);
}
volatile isn't necessarily the right way to share a variable between interrupt handler and foreground task, it happens to work for me with this compiler/code, you can do the research and even better, examine the compiler output (disassemble the binary) to confirm this isn't a problem.
ra=GET32(RCC_APB2ENR);
ra|=1<<11; //TIM1
PUT32(RCC_APB2ENR,ra);
...
counter=0;
PUT32(TIM1_CR1,0x00001);
PUT32(TIM1_DIER,0x00001);
PUT32(NVIC_ISER0,0x02000000);
for(rc=0;rc<10;)
{
if(counter>=1221)
{
counter=0;
toggle_led();
rc++;
}
}
PUT32(TIM1_CR1,0x00000);
PUT32(TIM1_DIER,0x00000);
A minimal init and runtime for tim1.
Notice that the NVIC_ISER0 is bit 25 that is set to enable interrupt 25 through.
Well before trying this code, I polled the timer status register to see how it works, compare with docs, clear the interrupt per the docs. Then with that knowledge confirmed with the NVIC_ICPR0,1,2 registers that it was interrupt 25. As well as there being no other gates between the peripheral and the NVIC as some chips from some vendors may have.
Then released it through to the core with NVIC_ISER0.
If you don't take these baby steps and perhaps you have already, it only makes the task much worse and take longer (yes, sometimes you get lucky).
TIM4 looks to be interrupt 30, offset/address 0x000000B8, in the vector table. NVIC_ISER0 (0xE000E100) covers the first 32 interrupts so 30 would be in that register. If you disassemble the code you are generating with the library then we can see what is going on, and or look it up in the library source code (as someone already did for you).
And then of course your timer 4 code needs to properly init the timer and cause the interrupt to fire, which I didn't check.
There are examples, you need to just keep looking.
The minimum is
vector in the table
set the bit in the interrupt set enable register
enable the interrupt to leave the peripheral
fire the interrupt
Not necessarily in that order.

Is there a HAL library ISR function that automatically triggers when a byte is received into the Rx buffer of SPIx on STM32L4xx?

I am wondering if there is a user-definable, built-in ISR function in the HAL library that triggers as soon as a byte is received in the SPIx Rx buffer on STM32L4xx MCU? For instance, as a startup test, I would like to send one byte (0xBC) from a Master STM32L452 nucleo board via SPI2 to a Slave STM32L452 nucleo board. Once the Slave board receives the byte, it flashes LED2, and transmits a different byte (0xCD) back to the Master. Once the Master receives the byte, it flashes LED2 as confirmation. I have initialized both boards as Master/Slave, enabled DMA and global interrupts, 8 bits per transfer using MXcube. I can achieve what I want using the HAL_SPI_Transmit_DMA() and HAL_SPI_Receive_DMA() functions and delays written into the while(1) portion of my main routine (as below). However, I would like to achieve the same using an ISR function that automatically executes once a byte is received into the SPI Rx Buffer.
Master Code:
uint8_t spiDataReceive = 0;
uint8_t spiDataTransmit = 0xBC;
while(1) {
if(!HAL_GPIO_ReadPin(GPIOC, GPIO_PIN_13)) {
//Transmit byte 0xBC to Slave and Receive Response
HAL_SPI_Transmit_DMA(&hspi2, &spiDataTransmit, 1);
HAL_Delay(20);
HAL_SPI_Receive_DMA(&hspi2, &spiDataReceive, 1);
if(spiDataReceive == 0xCD) {
flashLED2();
spiDataReceive = 0x00;
}
}
}
Slave Code:
uint8_t spiDataReceive = 0;
uint8_t spiDataTransmit = 0xCD;
while(1) {
HAL_SPI_Receive_DMA(&hspi2, &spiDataReceive, 1);
HAL_Delay(10);
if(spiDataReceive == 0xBC) {
HAL_SPI_Transmit_DMA(&hspi2, &spiDataTransmit, 1);
flashLED2();
spiDataReceive = 0x00;
}
}

No library is needed. You need to set RNEIE bit in the SPI CR register and enable in the NVIC the interrupt. 2 lines of code. No libraries needed.
The only needed resource is the Reference Manual from the STM website.

Yes, the HAL provides user callbacks. In order to use those, you have to activate the corresponding interrupt in NVIC and have the HAL handler called by the interrupt vector table (please have a look at stm32l4xx_it.c, too).
But before you do so, you should consider the following questions:
If you feel confused or frustrated by the complexity of ST HAL libraries, read the Reference Manual and follow the advice of P__J__ (see other answer).
If you feel confused or frustrated by the complexity of the hardware interface, follow the present answer.
Both HAL_SPI_Transmit_DMA() and HAL_SPI_Transmit_IT() support a variable number of transfer bytes.
If all you are going to need is that one-byte transfer, HAL functions may be an overkill.
Their advantage is that you can run some C library functions without dealing with HW register access in C (if that is quite new to you, coming from the arduino ecosystem). And of course, to transfer more than a single byte through the same interface when you extend your application.
You should decide whether you want to get an interrupt from the DMA you have tied to the UART, or if you want to avoid the DMA and get the interrupt from the UART itself. From my point of view, you should really not trigger an ISR by the same interrupt event which is used to start a DMA transfer to fetch the data!
In the same way as you find a description of the HW registers in the
Reference Manual and
Data Sheet of the controller, you find documentation on the HAL (layering concept, usage requirements etc.) in the
User manual of STM32L4/L4+ HAL and low-layer drivers
(see sections 70 and 102, resp., and chapter 3).
Of course, this interface aims mostly for abstraction and portability whereas directly addressing the HW interface usually allows much better efficiency in terms of latency/CPU load, and ROM/RAM usage. The "Low-Level" library drivers aim for a certain compromise, but if you are new to this whole topic and unsure what to start with, you should either start from the HW register interface, or from the portable HAL library API.
If the specification documents (HW or Lib description) are too abstract for you and you prefer some hands-on information source, you may want to first have a look at STM32Cube firmware examples for STM32CubeL4.
These also include SPI data exchange use cases (SPI_FullDuplex_ComIT for example) that are available for NUCLEO-L4532RE (and others) and described in application note AN4726 (page 16).
In addition to the interrupt selection/handling, you should check two more aspects of your program:
If you get an interrupt from the hardware, there is no reason for the HAL_Delay() calls.
Keep in mind that on SPI, you can only "return" data from slave to master while the master is transferring data (which may be zero data).
Otherwise, the "transmit" call on the slave side will only put data into the TX register, and the SPI peripheral will wait infinitely for the SCK trigger from the master...

one-wire over bit banging vs. one wire over usart

I want to use a sensor with one-wire protocol, the matter is which way of using this protocol is optimized and more rational? over usart or using bit banging?
if it is important I'm using am2305 and stm32fxx microcontrollers.

I'm prefer to use USART+DMA with one buffer for transmit and receive. And I think this choice depends on your skill and requierements of your project.
There many ways to implement one-wire protocol.
exti interrupts + timer base mode
timer input capture + dma
usart interrupts
usart (err interrupts only) + dma
All of them have thier advantages and disadvantages:
busy or free pins
busy or free periferals (tim, usart)
busy or free dma channels
lower or higher frequency of interrupts in programm
easy or hard for implement it
I have different projects where work first and last methods listed above.
You must know which method is preferred for you and your project.

How to use DMA for I2C read on Cortex M3

I'm using an Atmel SAM3S MCU, and their ASF stuff can do I2C (they call it TWI) communications. That's fine, except it's taking too much time from my main loop.
So, I'd like to be able to spark off a DMA transfer to read the data from the I2C device. However, all the docs say you can't turn on TX and RX simultaneously on a half-duplex device like TWI. The docs do show that it has a Peripheral DMA Controller (PDC) register section in the TWI registers, but I can't find any PDC examples, except for the USART, which is full duplex.
The only thing I can think of to try is to set TX section, and the next-RX section, and hope that it automatically enables RX after the TX is done.
Has anyone out there used DMA for an I2C read on the SAM3S? If so, could you point me to some docs or examples?

I'm not familiar with the particular part, however I would suggest that for many common usage patterns your best bet would probably be to only use DMA to handle multi-byte sequences of data. Most I2C peripherals allow data to be read out by performing a start with a "write" address byte, and, if that is acknowledged, sending out an address or other information about what data is desired. This is followed by a restart and a "read" address byte. If that is acknowledged, one may then perform all but one of the byte reads with the "ack" flag set. When that is finished, ask for the final byte to be read with the "ack" flag clear.
I'm not sure whether it would be worthwhile to use the DMA controller to clock out the bytes of the requested address, but probably not worthwhile to try to use it to clock out the first byte of the read command.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse