STM32F4 DMA can work in parallel? - stm32

I'm working on an appolication that requires 3x SPI (I'm master in all of them) and SDIO interface, using STM32F446.
2 of the SPIs (SPI1 and SPI2) are sensors that need to be read every 1ms. For SPI1 I need to write 1 byte and the response to that will be the value. For SPI2 I need to write 1 byte and then read 6.
The third SPI (SPI3) and the SDIO are for communication/logging and both of them don't need to transmit data with a fixed period.
Looking at the STM32F46x manual , section 9, it doesn't look like I can trigger DMA transfers with peripherals interupts (that's too bad), but can I do the all thing like:
Timer interrupt every 1ms: inside ISR SPI1 and SPI2 DMA transfers are triggered. DMA transfer fills buffer with received sensor data;
Everytime I need to write to SDIO or SPI3 I start a DMA transfer with less priority then the ones at SPI1 and SPI2.
I'm guessing that SPI1 and SPI2 can perform at parallel since I have 2 DMA controllers and that, if they occur at the same time of SPI2 and SDIO, the later will be blocked until the controller is free. Is that right?

For SPI1 I need to write 1 byte and the response to that will be the
value. For SPI2 I need to write 1 byte and then read 6.
Note that with SPI, reads and writes occur simultaneously, you can read a byte by writing a dummy one, you should take this into account when setting the number of words to transmit.
DMA transfer fills buffer with received sensor data;
Some SPI slaves won't work properly unless you set CS high between transfers. If that's the case with your sensors, you should do that in the receiving DMA stream interrupt. If you are thinking of letting DMA fill up a large buffer automatically, that won't work in this case.
I'm guessing that SPI1 and SPI2 can perform at parallel since I have 2
DMA controllers and that, if they occur at the same time of SPI2 and
SDIO, the later will be blocked until the controller is free. Is that
right?
They will not be blocked, but interleaved as long as the higher priority transfer does not tie up the DMA bandwidth completely. No SPI transfer can do that, since SPI needs at least 16 clock cycles to transfer a single byte (at least 2 cycles/bit).

Related

STM32MP157F dmaengine: dmaengine_prep_dma_memcpy works, dmaengine_prep_dma_cyclic does not

​
We have implemented our custom driver that uses DMA to copy a large amount of data from the FMC interface (an FPGA mapped to it) to the RAM using the STM32 mdma engine with 32 dma channels. The FPGA contains a small FIFO we want to copy the data from.
​
For very fast data acquisition the setup time for new DMA transactions becomes critical!
The first implementation used a workqueue to create the next DMA transaction. It could not be done directly from the "dma_completed" atomic context though some necessary IO that has to wait. This lead to pauses between DMA transaction up to 5ms and buffer overflows in the FPGAs FIFO.
As I am copying from a memory mapped region to RAM, I am using dmaengine_prep_dma_memcpy.
I implemented a number of improvements that reduced the pause betweens DMAs:
I am fusing dma mapped pages so that less dma transaction entries have to be created so less dma engine programming is necessary.
I am preparing the next dma pages upfront. So the next DMA transaction can be directly started from the "dma_completed" routine.
I am using a second dma channel and toggle between them when dma_completed is called. This allows to setup a second DMA with the first one still running. Though linux dma api allows this with one channel, the MDMA engine does not and ignores the added transactions.
Usually the pause is now lower than 1ms. But there a spikes were the FIFO nearly overflowing.
Finally I tried to use dmaengine_prep_dma_cyclic. This would be perfect. A continuously running DMA with no need for a setup time between interrupts.
​But this does not work. Or better: I do not get it to work...
The transaction created with dmaengine_prep_dma_cyclic does not want to start!
I am getting a new dma_cookie and any status request to the channel returns "DMA_IN_PROGRESS". It never completes and the completetion callback is also never called.
Though dmaengine_prep_dma_memcpy works fine...
I think this is because of the difference between software vs hardware triggered DMA transactions.
Looking into stm32-mdma.c is see that dmaengine_prep_dma_memcpy has its own setup routine whereas dmaengine_prep_dma_cyclic use stm32_mdma_set_xfer_param() that always configures a HW request.
My very big big questions:
Is there a way to use dmaengine_prep_dma_cyclic for a MEMORY to MEMORY DMA transaction (software triggered)? This would be the perfect solution to my performance problem...
​Are we missing some signals to connect the FPGA to the SOC? My FPGA programming collegue suspects some missing TSEL (trigger selection) setting. He suspects dmaengine_prep_dma_cyclic will work then.
If a minimum driver module source code example would help in getting better answers, I can provide one in short time. Please note that this is highly hardware specific. Other SOCs than STM32MP157F may have different behaviour.
Thanks for every feedback!
Bye Gunther
References:
https://wiki.st.com/stm32mpu/wiki/Dmaengine_overview
https://github.com/STMicroelectronics/linux/blob/v5.15-stm32mp/drivers/dma/stm32-mdma.c

STM32 I2C interrupt method requires a blocking while loop?

I have a Nucleo-F446RE, and I'm trying to get the I2C working with an IMU I have (LSM6DS33). I am using STM32CubeMX and checked out all the example code for my board which is related to I2C. Specifically I'll be talking about their 'I2C_TwoBoards_ComIT' example, but all their examples which use the interrupt method have this same quirk. Here is a snipped of their code from main.c:
/* The board sends the message and expects to receive it back */
do
{
/*##-2- Start the transmission process #####################################*/
/* While the I2C in reception process, user can transmit data through
"aTxBuffer" buffer */
if(HAL_I2C_Master_Transmit_IT(&I2cHandle, (uint16_t)I2C_ADDRESS, (uint8_t*)aTxBuffer, TXBUFFERSIZE)!= HAL_OK)
{
/* Error_Handler() function is called in case of error. */
Error_Handler();
}
/*##-3- Wait for the end of the transfer ###################################*/
/* Before starting a new communication transfer, you need to check the current
state of the peripheral; if it’s busy you need to wait for the end of current
transfer before starting a new one.
For simplicity reasons, this example is just waiting till the end of the
transfer, but application may perform other tasks while transfer operation
is ongoing. */
while (HAL_I2C_GetState(&I2cHandle) != HAL_I2C_STATE_READY)
{
}
/* When Acknowledge failure occurs (Slave don't acknowledge its address)
Master restarts communication */
}
while(HAL_I2C_GetError(&I2cHandle) == HAL_I2C_ERROR_AF);
Under comment ##-3- they explain that unless we wait for the I2C state to be ready again, after sending a command, the next command will overwrite the previous one, so they use a while loop which waits for the I2C state to be 'ready' before continuing.
Isn't this a very inefficient way to use an interrupt, and no different from using the standard polling method? Both block the main code, so what's the purpose of the interrupt?
In my personal example, I want to collect the accelerometer/gyroscope data at the 1.66 kHz rate which the IMU is capable of. I use a 2kHz timer to send an I2C command to read the acc/gyr data-ready register, and if the data is ready for either sensor I read their 6 bytes to get the x/y/z plane information. Using the polling method is too slow as blocking the code at a rate of 2kHz is not inefficient, but the interrupt method doesn't seem to be any faster as I still need to hang the system during the aforementioned while loop to check if I2C is ready for another command. What am I missing here?
Is this (the example you provided) an efficient way of doing things? No. Can blocking part be avoided? Yes. It's only a small example, a proof of concept, so there is some blocking in there. You should look deeper at why it is there and how can you implement what it does without blocking.
The point of that blocking part is to not start an I2C communication while another I2C communication is in progress. The problem is that while your line of code to send something over I2C has already been executed, the data is still being physically sent over the line, just because your MCU is much faster than I2C. You need to wait until I2C line is idle and available for transmission.
How to achieve that with interrupts and not waste cycles and processing time? Given in your case you can easily estimate the amount of data per each transmission, there is no probem to estimate how much time every transmission will take given your I2C speed. Since you're smartly and correctly using timer to schedule regular transmissions, you should be able to set the timer in such a way that by the next timer interrupt, which will send data, your previous communication has already ended.
For example, if you set the timer to 1Hz to start transmission, you can obviously be sure that by the next interrupt all the communication has happened. You don't need to poll anything at all.
I don't see much point in I2C-polling the IC at 2kHz if it produces data at 1.6kHz. You will have uneven time periods between samples, some data will be very fresh, while some data will come with little delay, plus there will be communication without data ready. It would be better to poll it at something like 1.5-1.6kHz and just expect data to always be there. Of course, given the communication fits into 1.5kHz period, which requires some napkin math.

Does HAL_SPI_Transmit() discard received data?

Suppose I have two STM boards with a full duplex SPI connection (one is master, one is slave), and suppose I use HAL_SPI_Transmit() and HAL_SPI_Receive() on each end for the communication.
Suppose further that I want the communication to consist of a series of single-byte command-and-response transactions: master sends command A, slave receives it and then sends response A; master sends command B, slave receives it and then sends response B, and so on.
When the master calls HAL_SPI_Transmit(), the nature of SPI means that while it clocks out the first byte over the MOSI line, it is simultaneously clocking in a byte over the MISO line. The master would then call HAL_SPI_Receive() to furnish clocks for the slave to do the transmitting of its response. My question: What is the result of the master's HAL_SPI_Receive() call? Is it the byte that was simultaneously clocked in during the master's transmit, or is is what the slave transmitted afterwards?
In other words, does the data that is implicitly clocked in during HAL_SPI_Transmit() get "discarded"? I'm thinking it must, because otherwise we should always use the HAL_SPI_TransmitReceive() call and ignore the received part.
(Likewise, when HAL_SPI_Receive() is called, what is clocked OUT, which will be seen on the other end?)
Addendum: Please don't say "Don't use HAL". I'm trying to understand how this works. I can move away from HAL later--for now, I'm a beginner and want to keep it simple. I fully recognize the shortcomings of HAL. Nonetheless, HAL exists and is commonly used.
Yes, if you only use HAL_SPI_Transmit() to send data, the received data at the same clocked event gets discarded.
As an alternative, use HAL_SPI_TransmitReceive() to send data and receive data at the same clock events. You would need to provide two arrays, one that contains data that will be sent, and the other array will be populated when bytes are received at the same clock events.
E.g. if your STM32 SPI Slave wishes to send data to a master when the master plans to send 4 clock bytes to it (master sends 0xFF byte to retrieve a byte from slave), using HAL_SPI_TransmitReceive() will let you send the data you wish to send on one array, and receive all the clocked bytes 0xFF on another array.
I never used HAL_SPI_Receive() before on its own, but the microcontroller that called that function can send any data as long as the clock signals are valid. If you use this function, you should assume on the other microcontroller that the data that gets sent must be ignored. You could also use a logic analyzer to trace the SPI data exchange between two microcontrollers when using HAL_SPI_Transmit() and HAL_SPI_Receive().

SPI DMA CIRCULAR Mode - stm32f4

Does anyone have a sample code of transfering data with SPI in DMA CIRCULAR mode for stm32?(16 bit)
With my code, master sends 16 bit data and in the next cycle receives the answer. But this transaction done with one cycle delay.
SPI is supposed to work that way.
When the SPI data register is written the first time, it starts sending the data, and immediately signals the DMA controller that it's ready for the next data word. Now there are two data words down in the transmitter, when it has barely started receiving the first one. When the first outgoing word is completely transmitted, and the first incoming word is completely received (these happen almost simultaneously), SPI starts sending the second word already in the data register, signals the transmit DMA channel that it's ready for the third data word, about the same time it also signals the receiving channel that the first incoming data word is ready.

I2C repeated start

I am trying to use TC74 (or DS1621) temperature sensor which comes with I2C interface. My I2C ISR is able so far to write command and config bytes to the chip. However I don't know how to instruct the ISR to jump to state 0x10 (repeated start) for a read operation. The read procedure is as following:
start bit by micro-controller (ATTINY48 in my case)
sending slave address+w (in state 0x8), ACK from slave
sending command byte to slave (in state 0x18), ACK from slave
at this point (state 0x28) ISR must send a repeated start and jump to state 0x10
then sending Slave Adrress+R , ACK from slave
then in state 0x40 data will be read from slave, NACK to slave
in state 0x58 data is ready and copied to proper variable, stop bit will be transmitted.
I can set a flag every time I call the TC74 read function and check that flag inside the ISR, so instead of sending the stop bit after writing the data byte to the TC74, it will issue a repeated start bit. However I am not sure if this is the correct and standard method or not. Generally, in many states of I2C peripherals, the next state must be decided.
How should I instruct the ISR in each state to jump to the desired next state?