Is DMA effective for small packets?

Is DMA effective for small packets? - operating-system

Is DMA chip effective for small data packet like 5 bytes.
I am working on a embedded project with STM32, I am using DMA for receiving 3 bytes packet but when using DMA to transmit 5 bytes packet, I don't get the packet at the receiving side immediately, but when using a normal transfer, I get the packet more quickly.
So can we say that DMA is not effective for small packet ?

The problem with using DMA for receiving is that the DMA transfer interrupt occurs then the DMA buffer if full and/or half-full, if receipt of the packet does not fill the buffer to reach the half or full-transfer threshold you will not get an interrupt to indicate the availability of the data.
So say your receiver has 6 byte DMA buffer and generates an interrupt at the half and full transfer, you send a 3 byte packet and you will get an interrupt at the end of every packet. If you send a 5 byte packet, you will get an interrupt for the first three bytes, then the next two bytes will sit in the buffer indefinitely until one more byte (from the next packet) arrives.
So it is not really a matter of "small packets" but rather packet alignment with the DMA buffer. It is not even in fact a matter of "packets" - if the data were a stream, you end up with the same problem when the data stream ceases if it happens not to be a multiple of the DMA buffer size.
Moreover even if the packets are an exact multiple of the DMA buffer size, if you get data loss over the link, you will no longer be aligned with the buffer and will encounter the same problem.
The solution in this case if to set a timer on each receive DMA half/full transfer to use as a timeout. The timeout would normally be set to something a little more than the time taken to transfer one half of the buffer (assuming you are using the half/full transfer to allow double or "ping-pong" buffering), so that if the buffer half/full transfer is not achieved, the timer expires and on the timer handler you retrieve the data that has been transferred. When the DMA half/full transfer interrupt does occur, you need to account for the data you have already retrieved in the timer interrupt, and just retrieve the new data.
So you would implement a driver where data is retrieved from the buffer on any of a DMA half-transfer, full-transfer or timer interrupt, and in the timer handler you maintain a count or index to the last buffer position retrieved so that in a half or full transfer you only retrieve the new data.
If your buffer is long and your packets small, and the data is not streaming, you could get several timer interrupts between DMA interrupts. In that case you might make your timeout smaller than the DMA buffer fill time. The optimal solution may depend on the nature of the data being transferred and the latency you can sustain.
If you can sustain the interrupt rate of one interrupt per character, then it will always be simpler to avoid DMA for receiving data, but it will never be "ineffective", just more complicated to get right. Your problem here is not that small DMA transfers are ineffective, but rather that your implementation is sub-optimal.

Related

STM32F4 Timer Triggered DMA SPI – NSS Problem

I have a STM32F417IG microcontroller an external 16bit-DAC (TI DAC81404) that is supposed to generate a Signal with a sampling rate of 32kHz. The communication via SPI should not involve any CPU resources. That is why I want to use a timer triggered DMA to shift the data with a rate of 32kHz to the SPI data register in order to send the data to the DAC.
Information about the DAC
Whenever the DAC receives a channel address and the new corresponding 16bit value the DAC will renew its output voltage to the new received value. This is achieved by:
Pulling the CS/NSS/SYNC – pin to low
Sending the 24bit/3 byte long message and
Pulling the CS back to a high state
The first 8bit of the message are containing among other information the information where the output voltage should be applied. The next and concurrently the last 16bit are containing the new value.
Information about STM32
Unfortunately the microcontroller of ST are having a hardware problem with the NSS-pin. Starting the communication via SPI the NSS-pin is pulled low. Now the pin is low as long as SPI is enabled (. (reference manual page 877). That is sadly not the right way for communicate with device that are in need of a rise of the NSS after each message. A “solution” would be to toggle the NSS-pin manually as suggested in the manual (When a master is communicating with SPI slaves which need to be de-selected between transmissions, the NSS pin must be configured as GPIO or another GPIO must be used and toggled by software.)
Problem
If DMA is used the ordinary way the CPU is only used when starting the process. By toggling the NSS twice every 1/32000 s this leads to corresponding CPU interactions.
My question is whether I missed something in order to achieve a communication without CPU.
If not my goal is now to reduce the CPU processing time to a minimum. My pIan is to trigger DMA with a timer. So every 1/32k seconds the data register of SPI is filled with the 24bit data for the DAC.
The NSS could be toggled by a timer interrupt.
I have problems achieving it because I do not know how to link the timer with the DMA of the SPI using HAL-functions. Can anyone help me?

This is a tricky one. It might be difficult to avoid having one interrupt per sample with this combination of DAC and microcontroller.
However, one approach I would look at is to have the CS signal created as a timer output-compare (like PWM). You can use multiple channels of the same timer or link multiple timers to create a delay between the CS output and the DMA trigger. You should allow some room for jitter, because depending on what else is happening the DMA might not respond instantly. This won't hurt your DAC output signal though, because it only outputs the value on the rising edge of chip select (called SYNC in the DAC datasheet) which will still be from your first timer.

How does UDP SetWriteBuffer and SetReadBuffer the OS's buffers?

Description
I'm busy writing a high frequency UDP server with Go. I'd estimate at least 1000 packets/second both ways.
However as the size of data I'm sending over the UDP socket grew, I eventually ran into the follow error: read udp 127.0.0.1:1541->127.0.0.1:9737: wsarecv: A message sent on a datagram socket was larger than the internal message buffer or some other network limit, or the buffer used to receive a datagram into was smaller than the datagram itself.
I eventually just grew the size of the buffers I was reading from and writing into as follows:
buffer := make([]byte, 64 * 1024 * 1024) // used to just be 1024
l, err := s.socketSim.Read(buffer)
This worked fine and I stopped getting the error... However then I can across two functions inside the net package:
s.socketSim.SetWriteBuffer(64 * 1024 * 1024)
s.socketSim.SetReadBuffer(64 * 1024 * 1024)
I learned that these two act on the operating system's transmit buffer
Question
Do I even care to set the operating system buffer size and why? How does the size on the application buffer impact the size of the operating system buffer? Should they always be the same and how big should/can they become?

First, not only do you have an MTU size for each interface on your device and whatever destination you're send/recving from, but there is also an MTU size for each device in between. For this reason, as others have mentioned, you might want to use what is generally accepted for MTU since you might not control every device in the data route. In the case of UDP, MTU really just means how big a datagram can be before fragmenting.
Second, you almost certainly want your SND/RCV buffers to be larger than the MTU. These are kernel buffers which hold on to data when you're not ready to receive them. A larger UDP RCV buffer means that the kernel will buffer more packets for you instead before dropping them into the abyss. Maybe you have some non-trivial work to do for each packet. Depending on the bitrate, you might want a larger or smaller kernel buffer.
Finally, you're using UDP. There is no guarantee that you'll receive packets in order or at all. Any router in between you and a peer could decide to drop the packet for any reason. Since you're using UDP, you should prepare for dropped and out-of-order packets. You also might need some sort of retransmission mechanism, which further complicates things.
Or you might consider using TCP if dropped packets are unacceptable, knowing that timing is indeterminate.
If you're on linux, you can see current buffer sizes in /proc/sys/net. Usually the kernel will double what you ask for.
Also, you can tune your buffer size by watching for packet drops in /proc/net/udp. If you see drops, you might want to make your rcv buffer bigger, especially if the data is bursty and the processing intensive. If you're data is coming in at a consistent rate and you're still dropping packets, then you aren't processing them fast enough.

How to run a periodic thread in high frequency(> 100kHz) in a Cortex-M3 microcontroller in an RTOS?

I'm implementing a high frequency(>100kHz) Data acquisition system with an STM32F107VC microcontroller. It uses the spi peripheral to communicate with a high frequency ADC chip. I have to use an RTOS. How can I do this?
I have tried FreeRTOS but its maximum tick frequency is 1000Hz so I can't run a thread for example every 1us with FreeRTOS. I also tried Keil RTX5 and its tick frequency can be up to 1MHz but I studied somewhere that it is not recommended to set the tick frequency high because it increases the overall context switching time. So what should I do?
Thanks.

You do not want to run a task at this frequency. As you mentioned, context switches will kill the performance. This is horribly inefficient.
Instead, you want to use buffering, interrupts and DMA. Since it's a high frequency ADC chip, it probably has an internal buffer of its own. Check the datasheet for this. If the chip has a 16 samples buffer, a 100kHz sampling will only need processing at 6.25kHz. Now don't use a task to process the samples at 6.25kHz. Do the receiving in an interrupt (timer or some signal), and the interrupt should only fill a buffer, and wake up a task for processing when the buffer is full (and switch to another buffer until the task has finished). With this you can have a task that runs only every 10ms or so. An interrupt is not a context switch. On a Cortex-M3 it will have a latency of around 12 cycles, which is low enough to be negligible at 6.25kHz.
If your ADC chip doesn't have a buffer (but I doubt that), you may be ok with a 100kHz interrupt, but put as little code as possible inside.
A better solution is to use a DMA if your MCU supports that. For example, you can setup a DMA to receive from the SPI using a timer as a request generator. Depending on your case it may be impossible or tricky to configure, but a working DMA means that you can receive a large buffer of samples without any code running on your MCU.

I have to use an RTOS.
No way. If it's a requirement by your boss or client, run away from the project fast. If that's not possible, communicate your concerns in writing now to save your posterior when the reasons of failure will be discussed. If it's your idea, then reconsider now.
The maximum system clock speed of the STM32F107 is 36 MHz (72 if there is an external HSE quartz), meaning that there are only 360 to 720 system clock cycles between the ticks coming at 100 kHz. The RTX5 warning is right, a significant amount of this time would be required for task switching overhead.
It is possible to have a timer interrupt at 100 kHz, and do some simple processing in the interrupt handler (don't even think about using HAL), but I'd recommend investigating first whether it's really necessary to run code every 10 μs, or is it possible to offload something that it would do to the DMA or timer hardware.

Since you only have a few hundred cycles (instructions) between input, the typical solution is to use an interrupt to be alerted that data is available, and then the interrupt handler put the data somewhere so you can process them at your leisure. Of course if the data comes in continuously at that rate, you maybe in trouble with no time for actual processing. Depending on how much data is coming in and how frequent, a simple round buffer maybe sufficient. If the amount of data is relatively large (how large is large? Consider that it takes more than one CPU cycle to do a memory access, and it takes 2 memory accesses per each datum that comes in), then using DMA as #Elderbug suggested is a great solution as that consumes the minimal amount of CPU cycles.

There is no need to set the RTOS tick to match the data acquisition rate - the two are unrelated. And to do so would be a very poor and ill-advised solution.
The STM32 has DMA capability for most peripherals including SPI. You need to configure the DMA and SPI to transfer a sequence of samples directly to memory. The DMA controller has full and half transfer interrupts, and can cycle a provided buffer so that when it is full, it starts again from the beginning. That can be used to "double buffer" the sample blocks.
So for example if you use a DMA buffer of say 256 samples and sample at 100Ksps, you will get a DMA interrupt every 1.28ms independent of the RTOS tick interrupt and scheduling. On the half-transfer interrupt the first 128 samples are ready for processing, on the full-transfer, the second 128 samples can be processed, and in the 1.28ms interval, the processor is free to do useful work.
In the interrupt handler, rather then processing all the block data in the interrupt handler - which would not in any case be possible if the processing were non-deterministic or blocking, such as writing it to a file system - you might for example send the samples in blocks via a message queue to a task context that performs the less deterministic processing.
Note that none of this relies on the RTOS tick - the scheduler will run after any interrupt if that interrupt calls a scheduling function such as posting to a message queue. Synchronising actions to an RTOS clock running asynchronously to the triggering event (i.e. polling) is not a good way to achieve highly deterministic real-time response and is a particularly poor method for signal acquisition, which requires a jitter free sampling interval to avoid false artefacts in the signal from aperiodic sampling.
Your assumption that you need to solve this problem by an inappropriately high RTOS tick rate is to misunderstand the operation of the RTOS, and will probably only work if your processor is doing no other work beyond sampling data - in which case you might not need an RTOS at all, but it would not be a very efficient use of the processor.

Reading PWM signals in STM32F407

I am doing a quadcopter using a STM32F407 discovery. I was finally able to stabilize it. Now i am trying to use the RC receiver so i can control my quadcopter movements. Is there a way to read the signal of PWM of my RC receiver channels ??
Also my RC receiver supports PPM and according to what i understand it receives a packet of duty cycles strong textbut still don't know how to receive this.

You can use the SPI interface to encode the PPM (or the PWM) signal of your RC receiver.
General approach:
Connect the PPM signal to the MISO pin and a second one of the controller (simultaneously). MOSI, CLK, and CS pins are not needed.
Initialize the SPI interface with a appropriate clock. With this frequency the signal will be shifted in the controller. Try to use 4kHz.
Depending on the idle state of the signal enable either rising or falling edge interrupt trigger on the second pin. This will used to trigger incoming frames.
If the interrupt occurs disable the trigger temporary and start spi transmission to get several bytes (outgoing ingored and not connected). Depending on the Frame length 8 or 10 Bytes should do it. This will catch frames up to 20ms.
After you get the all bytes enable the trigger again and repeat for the next frame.
The received data should contain the pattern of the pwm/ppm signal.
You should also match the sampling rate and the amount of bytes to receive with your RC receiver.

Using MATLAB to send multiple serial signals through the same port

I'd like to send multiple signals (4 inputs and outputs and 7 outputs) from my Laptop to a microcontroller. I'm thinking of using a USB to serial converter and multiplexing the data through the port. I'll need to write codes both in the laptop end and in the microcontroller to multiplex the data.
Eg:
Tx of microcontroller:
1.Temperature sensor ADC output->Laptop
2.Voltage sensor to laptop
3.Current Sensor to Laptop
4.Photodiode current to Laptop
So I need to write a program in the microcontroller to send the data in this order. How can I accomplish this? I was thinking of an infinite loop which sends the data with time delays in between.
At the Rx pin of Microcontroller,
Seven bit sequences. Each bit sequence will be used to set the duty cycle of a PWM generated by the microcontroller.
I also need the same multiplexing or demultiplexing arrangement in the matlab end. Here too, I'm thinking of allotting some virtual 'channels' at different instants of time. What kind of algorithm would I need?

In case you always send all the inputs/outputs at the same rate, you could simply pack them into 'packets', which always start with one or more bytes with a fixed value that form a 'packet header'. The only risk is that one of the bytes of the sensor data might have the same value as the start-byte at the moment you try to start receiving bytes and you are not yet synchronized. You can reduce this risk by making the header longer, or by choosing a start-byte that is illegal output for the sensors (typically OxFF or so).
The sending loop on the microcontroller is really easy (pseudocode):
while True:
measure_sensors()
serial.send(START_BYTE)
serial.send(temperature)
serial.send(voltage)
serial.send(current)
serial.send(photodiode)
end while
The receiving loop is a bit more tricky, since it needs to synchronize first:
while True:
data = serial.receive()
if data != START_BYTE:
print 'not synced'
continue #restart at top of while
end if
temperature = serial.receive()
voltage = serial.receive()
current = serial.receive()
photodiode = serial.receive()
do_stuff_with_measurements()
end while
This same scheme can be used for communication in both directions.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse