STM32MP157F dmaengine: dmaengine_prep_dma_memcpy works, dmaengine_prep_dma_cyclic does not - stm32

​
We have implemented our custom driver that uses DMA to copy a large amount of data from the FMC interface (an FPGA mapped to it) to the RAM using the STM32 mdma engine with 32 dma channels. The FPGA contains a small FIFO we want to copy the data from.
​
For very fast data acquisition the setup time for new DMA transactions becomes critical!
The first implementation used a workqueue to create the next DMA transaction. It could not be done directly from the "dma_completed" atomic context though some necessary IO that has to wait. This lead to pauses between DMA transaction up to 5ms and buffer overflows in the FPGAs FIFO.
As I am copying from a memory mapped region to RAM, I am using dmaengine_prep_dma_memcpy.
I implemented a number of improvements that reduced the pause betweens DMAs:
I am fusing dma mapped pages so that less dma transaction entries have to be created so less dma engine programming is necessary.
I am preparing the next dma pages upfront. So the next DMA transaction can be directly started from the "dma_completed" routine.
I am using a second dma channel and toggle between them when dma_completed is called. This allows to setup a second DMA with the first one still running. Though linux dma api allows this with one channel, the MDMA engine does not and ignores the added transactions.
Usually the pause is now lower than 1ms. But there a spikes were the FIFO nearly overflowing.
Finally I tried to use dmaengine_prep_dma_cyclic. This would be perfect. A continuously running DMA with no need for a setup time between interrupts.
​But this does not work. Or better: I do not get it to work...
The transaction created with dmaengine_prep_dma_cyclic does not want to start!
I am getting a new dma_cookie and any status request to the channel returns "DMA_IN_PROGRESS". It never completes and the completetion callback is also never called.
Though dmaengine_prep_dma_memcpy works fine...
I think this is because of the difference between software vs hardware triggered DMA transactions.
Looking into stm32-mdma.c is see that dmaengine_prep_dma_memcpy has its own setup routine whereas dmaengine_prep_dma_cyclic use stm32_mdma_set_xfer_param() that always configures a HW request.
My very big big questions:
Is there a way to use dmaengine_prep_dma_cyclic for a MEMORY to MEMORY DMA transaction (software triggered)? This would be the perfect solution to my performance problem...
​Are we missing some signals to connect the FPGA to the SOC? My FPGA programming collegue suspects some missing TSEL (trigger selection) setting. He suspects dmaengine_prep_dma_cyclic will work then.
If a minimum driver module source code example would help in getting better answers, I can provide one in short time. Please note that this is highly hardware specific. Other SOCs than STM32MP157F may have different behaviour.
Thanks for every feedback!
Bye Gunther
References:
https://wiki.st.com/stm32mpu/wiki/Dmaengine_overview
https://github.com/STMicroelectronics/linux/blob/v5.15-stm32mp/drivers/dma/stm32-mdma.c

Related

STM32 I2C interrupt method requires a blocking while loop?

I have a Nucleo-F446RE, and I'm trying to get the I2C working with an IMU I have (LSM6DS33). I am using STM32CubeMX and checked out all the example code for my board which is related to I2C. Specifically I'll be talking about their 'I2C_TwoBoards_ComIT' example, but all their examples which use the interrupt method have this same quirk. Here is a snipped of their code from main.c:
/* The board sends the message and expects to receive it back */
do
{
/*##-2- Start the transmission process #####################################*/
/* While the I2C in reception process, user can transmit data through
"aTxBuffer" buffer */
if(HAL_I2C_Master_Transmit_IT(&I2cHandle, (uint16_t)I2C_ADDRESS, (uint8_t*)aTxBuffer, TXBUFFERSIZE)!= HAL_OK)
{
/* Error_Handler() function is called in case of error. */
Error_Handler();
}
/*##-3- Wait for the end of the transfer ###################################*/
/* Before starting a new communication transfer, you need to check the current
state of the peripheral; if it’s busy you need to wait for the end of current
transfer before starting a new one.
For simplicity reasons, this example is just waiting till the end of the
transfer, but application may perform other tasks while transfer operation
is ongoing. */
while (HAL_I2C_GetState(&I2cHandle) != HAL_I2C_STATE_READY)
{
}
/* When Acknowledge failure occurs (Slave don't acknowledge its address)
Master restarts communication */
}
while(HAL_I2C_GetError(&I2cHandle) == HAL_I2C_ERROR_AF);
Under comment ##-3- they explain that unless we wait for the I2C state to be ready again, after sending a command, the next command will overwrite the previous one, so they use a while loop which waits for the I2C state to be 'ready' before continuing.
Isn't this a very inefficient way to use an interrupt, and no different from using the standard polling method? Both block the main code, so what's the purpose of the interrupt?
In my personal example, I want to collect the accelerometer/gyroscope data at the 1.66 kHz rate which the IMU is capable of. I use a 2kHz timer to send an I2C command to read the acc/gyr data-ready register, and if the data is ready for either sensor I read their 6 bytes to get the x/y/z plane information. Using the polling method is too slow as blocking the code at a rate of 2kHz is not inefficient, but the interrupt method doesn't seem to be any faster as I still need to hang the system during the aforementioned while loop to check if I2C is ready for another command. What am I missing here?
Is this (the example you provided) an efficient way of doing things? No. Can blocking part be avoided? Yes. It's only a small example, a proof of concept, so there is some blocking in there. You should look deeper at why it is there and how can you implement what it does without blocking.
The point of that blocking part is to not start an I2C communication while another I2C communication is in progress. The problem is that while your line of code to send something over I2C has already been executed, the data is still being physically sent over the line, just because your MCU is much faster than I2C. You need to wait until I2C line is idle and available for transmission.
How to achieve that with interrupts and not waste cycles and processing time? Given in your case you can easily estimate the amount of data per each transmission, there is no probem to estimate how much time every transmission will take given your I2C speed. Since you're smartly and correctly using timer to schedule regular transmissions, you should be able to set the timer in such a way that by the next timer interrupt, which will send data, your previous communication has already ended.
For example, if you set the timer to 1Hz to start transmission, you can obviously be sure that by the next interrupt all the communication has happened. You don't need to poll anything at all.
I don't see much point in I2C-polling the IC at 2kHz if it produces data at 1.6kHz. You will have uneven time periods between samples, some data will be very fresh, while some data will come with little delay, plus there will be communication without data ready. It would be better to poll it at something like 1.5-1.6kHz and just expect data to always be there. Of course, given the communication fits into 1.5kHz period, which requires some napkin math.

Uart dma receive interrupt stops receiving data after several minutes

I have a project that I have used stm32f746g discovery board. It receives data with fixed size from Uart sequentially and to inform application about each data receive completed, dma callback is used (HAL_UART_RxCpltCallback function). It works fine at the beginning but after several minutes of running, the dma callback stops to be called, and as a result, the specified parameter value doesn't get updated. Because the parameter is used in another thread too (actually a rtos defined timer), I believe this problem is caused by lacking of thread safety. But my problem is that mutex and semaphore don't be supported in ISRs and I need to protect my variable in dma callback which is an interrupt routine. I am using keil rtx to handle multithreading and the timer I use is osTimer that is defined in rtx. How can I handle the issue?
Generally, only one thread should communicate with the ISR. If multiple threads are accessing a variable shared with an ISR, your design is wrong and needs to be fixed. In case of DMA, only one thread should access the buffer.
You'll need to protect the variables shared between that thread and the ISR - not necessarily with a mutex/semaphore but perhaps with something simpler like guaranteeing atomic access (best solution if possible), or by using the non-interrruptable abilitiy that many ISRs have. Example for simple, single-threaded MCU applications. Alternatively just temporarily disable interrupts during access, but that may not be possible, depending on real-time requirements.

STM32F302 Adc with DMA for different size and channel

I'm using STM32F302 QFN32 and unfortunately, it has only one ADC module. One channel must get around 500 samples in one period and it must be sync with and PWM (thinking using a timer and this i/o will be toggled in callback, because while reading its ADC channel, I must know the i/o whether high or low, so that according to this value, will decide value). Furthermore, there are 4 more channels which must be read.(More samples doesn't need there like before, 8 or 16 samples will be enough.) However, it has only one ADC module. Consequently, Can I do this? If yes, how? Thank you.
ST ADC have two conversion modes. Regular and Injected.
Regular mode is like all ADC's have. You start it, either by software or trigger (timer/gpio) and it does one or a sequence of conversions. The result is written to a common register, that the DMA takes care of.
Injected mode is a high priority preemption conversion. Once you start an injected conversion sequence by software or trigger. The ADC injects the conversion between the regular conversions. As a higher priority one. The result is stored in one of the injected result channel for the interrupt.
Only regular mode supports DMA. See AN4195 for more info.
I suggest you use a timer to trigger a regular sequence for your fast channel, with a circular DMA setup to move the data. And use another timer to trigger the injected sequence. There is a maximum of 4 injected channels, so you are in luck!
Obviously, you can do this the other way around. Have fast injections and slow regular. But you'll need another timer synchronized to the injected start trigger to get the DMA to move the data.
That is, if your samplerate does not allow immediate processing. Otherwise you can just use the ISR.

A FreeRTOS task suddenly does nothing

I'm developing a real time system with FreeRTOS on an
STM3240G
board.
The system contains some different tasks ( GUI, KB, ModBus, Ctrl, etc . . )
The tasks have different priorities.
The GUI seems to display a little slowly.
So I use a Profiler software to see what is going on between the different tasks
during a run. This profiler shows me which task was running at each moment ( microsecond) and what interrupts had arrived.
This profiler enables me to "mark" different locations on the code so I know
when it was there. So I run the program and make a record.
I looked at the record and I saw that (for example) Ctrl task was between two
lines of code for 15 milliseconds (this time change in size) there was not any
task change no interrupt arrived and after this time the system continues normally from this point according to the record and my marks.
I tried closing disabling different interrupts without any success.
Has anyone any idea what it could be?
On the eval board, there is a MIPI connector that supports ETM trace - a considerable luxury/advantage over other development boards!
If you also have one of the more expensive debug adapters that also support ETM trace (like for example, uTrace or J-Trace or
ULINKpro or I-jet Trace), you should use it to trace the entire control flow without having to instrument tasks and ISRs.
Otherwise, you should re-check if really every IRQ handler has been instrumented (credits to #RealtimeRik, who pointed this out) at a low-enough level so that the profiler can really spot it.
Especially if you are using third-party libraries, it may be that some IRQs are serviced by handlers you (or the profiler) doesn't have the code of.
As I had to make this experience once myself, I suggest you review the NVIC settings carefully to re-check if there is an ISR you haven't been aware of.
Another question is how the profiler internally works.
If it is based on ETM/TPIU or ITM/SWO tracing, see above.
If it creates and counts a huge number of snapshots, there might be some systematic cases which prevent snapshots to be made in a particular part of the software:
Could it be that a non-maskable interrupt or exception handler is running in a way that cannot be interrupted by the mechanism that collects snapshots?
Could it be that the timing of the control task correlates (by some coincidence) to a timing signal used for snapshots?
What happens if you insert some time-consuming extra code in front of the unexpected "profiling gap" (e.g., some hundreds or thousands of NOPs)?

What are the differences between Clock and I/O interrupts?

What are the differences between clock and I/O interrupts?
As I understand it a clock interrupt uses the system clock for interrupting the CPU and an I/O interrupt is sent to the CPU based off of program input or output completion. This was helpful in understanding interrupts in general, but I'm trying to compare these two kinds.
edit:
In a multiprogramming context, using a uniprocessor (to make things simple)
Timer/clock interrupts are often used for scheduling. These interrupts invoke the scheduler and it may switch the currently executing thread/process to another by saving the current context and loading another one.
Other than the purpose, an interrupt is an interrupt.
The main purpose of clock interrupt is to help out in what we call it "Multitasking". It deceives us and make us to think that internally parallel working is going on (Means many applications are running at the same time).But in reality it's not.Clock sends interrupt after a specified fraction of second,depends on system speed, to the processor to terminate it's current thread, save its address and data to stake and hold the application of which interrupt is sent.
i hope this will help you.