I have a Netcard eth0,it has single queue and its IRQ number is 63,
My question is:
If I set /proc/irq/63/smp_affinity to fffff
Whether means that the Linux kernel will distribute the IRQ of eth0 to each cpu in my system?
is its function equal to the rps(receive package steering)?
No, the smp_affinity is a bitmask or cpu list of allowed CPUs for this IRQ. For eg if set to 0x1 it will pin that IRQ to CPU 0 ...
No. Setting the smp_affinity to fffff just means that the kernel can use any CPU in the fffff to handle IRQ 63.
If you want to distribute packet processing load with a NIC that only has a single RX queue, you have to use RPS.
Check out a blog post I wrote about this that explains all of this and more in detail.
Related
I'm currently starting to use ThreadX on a STM32 Nucleo-H723ZG (STM32H723ZG MCU).
I noticed that when loading the Nx_TCP_Echo_Server / Nx_TCP_Echo_Client projects from CubeMX, the RAM gets filled up pretty much to the top, which makes me wonder, how I'm supposed to add my own code and data here.
Since I'm pretty new to RAM partitioning, RTOS and similar, I don't have a perfect feeling for what is wrong or right and how to proceed (and if it is a problem at all).
Nevertheless I wonder, if maybe using a different way of partitioning the RAM or by dropping some non-necessary code-parts, the RAM could be freed-up.
Or a different way of thinking:
Since RAM_D1 got filled, but _D2, _D3 and DTCMRAM are pretty much empty, is there a way to use the free RAM for my own purposes (I would like to let SPI and ADC processing run via DMA, so this needs a place to go ....)
Hope my questions are not too confusing ;)
The system has the following amount of RAM, according to STM:
"SRAM: total 564 Kbytes all with ECC, including 128 Kbytes of data TCM RAM for critical real-time data + 432 Kbytes of system RAM (up to 256 Kbytes can remap on instruction TCM RAM for critical real time instructions) + 4 Kbytes of backup SRAM (available in the lowest-power modes)" (see STMs STM32H723ZG MCU product page)
Down below you'll find screenshots of the current RAM usage, for RAM_D1 especially .tcp_sec eats up most of the RAM.
--> Can .tcp_sec be optimized or kicked-out?
If tcp means here the tcp protocol, maybe this can be a way to optimize this, since I'm not sure whether I need a handshake etc., maybe UDP is sufficient (and faster for the ADC data streaming) ... what do you say?
Edit:
The linker-file shows, that there .tcp_sec (NOLOAD) is written ... is NOLOAD maybe a hint on a "placebo" RAM occupation (pre-allocation / reservation, but no actual usage?)
Linker-script extract:
/* User_heap_stack section, used to check that there is enough RAM left */
._user_heap_stack :
{
. = ALIGN(8);
PROVIDE ( end = . );
PROVIDE ( _end = . );
. = . + _Min_Heap_Size;
. = . + _Min_Stack_Size;
. = ALIGN(8);
} >RAM_D1
.tcp_sec (NOLOAD) : {
. = ABSOLUTE(0x24048000);
*(.RxDecripSection)
. = ABSOLUTE(0x24048060);
*(.TxDecripSection)
} >RAM_D1 AT> FLASH
For context:
I am developing a "system controller", where my plan is to have it running a RTOS, which manages the read-in of analog values, writing control messages via SPI to two other STMs of the same kind and communicating via Ethernet to my desktop application.
The desktop application is then in charge of post-processing the digitized analog values and sending control messages to the system controller. In the best case the system controller digitizes the analog signal on ADC3 with 5 MSPS (at probably 6 Bit resolution = 30 MBit/s) and sends that data hickup-free to my desktop application.
-> Is this plan possible on this MCU?
I tried to buy a higher (more RAM) version of the nucleo I've got, but due to shortages this one is the best one I was able to get.
For the RTOS I'd like to stick with ThreadX, since FreeRTOS support in STM32IDE seems to be phased out now, after ThreadX was employed as the RTOS by STM.
(I like the easy register configuration using CubeMX/STM32 IDE, hence my drive to use that SW universe ... if there are good reasons to use a different RTOS, tell me :) )
Thank you for your time!
I generated the same project on my side and took a look. I believe you should be able to implement what you want in this CPU. You will need to carefully use the available memory.
It seems there is a confusion about the section .tcp_sec. It contains DMA reception and transmission descriptors for the Ethernet controller/driver. These are constrained by the driver and hardware to be at a specific address. The descriptors are rather small, but the buffers are bigger. With some work these can be reduced. If you are using Ethernet you will need this, no mater if you use TCP or not. As I said, the name can be confusing.
The flash has still plenty of space available. In the debug configuration only about 11% is used. The rest is available for your application code.
You can locate you data in other memory regions. Depending on the toolchain you will use is how you will need to tell the compiler/linker where your data goes. You can look towards the top of the main.c file in that example to see how the DMA descriptors are assigned to a specific section for three different toolchains (IAR, ARM MDK, GCC).
In terms of how to most efficiently use and configure the microcontroller peripherals please get in touch with STMicro, they will know best.
This should get you started. Let us know if this helps!
I have a Cypress S25FL064 flash memory connected through QSPI interface to a STM32H753.
The memory is booting in "single IO" mode (ie. classical SPI protocol with MISO/MOSI signals).
In this mode I am able to erase, program and read the memory through its command set.
Then I switch to "quad io" and "memory mapped" mode.
In this configuration:
when the CPU sends "short" read commands (ex: I read a byte in my software -> the CPU sends a 4 bytes read command to the memory) everything works fine
when the CPU sends "long" read commands (ex. i do a memcpy from flash to internal RAM -> the CPU sends a single N bytes read command) the read data is corrupted after 10 bytes or so. However the signals on a scope seem correct.
The SPI clock is pretty slow (6.25MHz) compared to QPSI peripheral clock (200MHz) and CPU clock (400MHz).
About GPIOs I've let all IOs in high impedance mode (ie. bits 00 in GPIOx_PUPDR register) as adviced in the flash user manual. Pins frequency is medium.
The flash user manual mentions explicitely that it is possible to read the whole memory in a single command.
Any idea why "long" read commands would fail ? It looks like a synchronisation issue but I don't see what setting could cause that.
I have configured a UART to receive in DMA mode where the size of the buffer is around 64 bytes. So, basically, the HAL_UART_RxCpltCallback() DMA receive complete interrupt will only fire when 64 chars are received.
Is there a way in STM32 through which I can configure a timeout for DMA Rx where when the buffer is only partially filled (i.e. less than 64 chars are received) and we don't receive anymore chars for a specified timeout, the DMA will then raise the same HAL_UART_RxCpltCallback() based interrupt to let the consumer consume whatever partial data is currently received on the UART?
You can use the UART Idle detection interrupt in parallel to the DMA interrupt.
I have used this multiple times with ST32F0xx processors and it is working perfectly.
There Idle detection should be available on F4 and F7 processors too.
There are some tutorials in the internet which target your problem and also provide the solution with the Idle detection.
E.g. check out this one this one.
It's easy but you have to use USART receiver timeout interrupt instead.
in order to get a count of transferred bytes, you can use DMA_CNDTRx or DMA_SxNDTR register (name different for STM family, where x - channel number ).
This register decrements after each DMA transfer. Once the transfer is completed, this register can either stay at zero or be reloaded automatically by the value previously programmed if the channel is configured in autoreload mode.
Unfortunately, STM HAL doesn't provide API, you should implement it yourself.
I have several kernel modules which need to interact with userspace. Hence, each module has a Netlink socket.
My problem is that these sockets interfere with each other. This is because all of them register to the same Netlink address family (because there aren't many available to begin with - the max is 32 and more than half are already reserved) and also because they all bind themselves to the same pid (the kernel pid - zero).
I wish there were more room for address families. Or, better yet, I wish I could bind my sockets to other pids. How come Netlink is the preferred user-kernel channel if only 32 sockets can be open at any one time?
libnl-3's documentation says
The netlink address (port) consists of a 32bit integer. Port 0 (zero) is reserved for the kernel and refers to the kernel side socket of each netlink protocol family. Other port numbers usually refer to user space owned sockets, although this is not enforced.
That last claim seems to be a lie right now. The kernel uses a constant as pid and doesn't export more versatile functions:
if (netlink_insert(sk, 0))
goto out_sock_release;
I guess I can recompile the kernel and increase the address family limit. But these are kernel modules; I shouldn't have to do that.
Am I missing something?
No.
Netlink's socket count limit is why Generic Netlink exists.
Generic Netlink is a layer on top of stock Netlink. Instead of opening a socket, you register a callback on an already established socket, and listen to messages directed to a "sub"-family there. Given there are more available family slots (1023) and no ports, I'm assuming they felt a separation between families and ports was unnecessary at this layer.
To register a listener in kernelspace, use genl_register_family() or its siblings. In userspace, Generic Netlink can be used via libnl-3's API (though it's rather limited, but the code speaks a lot and is open).
You are confused by MAX_LINKS variable name. It is not a "maxumum amount of links", it's a "maximum amount of families". The things you listed are netlink families or IOW netlink groups. There are indeed 32 families. Each family dedicated to serve some particular purpose. For example NETLINK_SELINUX is for SELinux notification and NETLINK_KOBJECT_UEVENT is for kobject notifications (these are what udev handles).
But there are no restrictions on number of sockets for each of the family.
When you call netlink_create it's checking your protocol number which in case of netlink socket is netlink family like NETLINK_SELINUX. Look at the code
static int netlink_create(struct net *net, struct socket *sock, int protocol,
int kern)
{
...
if (protocol < 0 || protocol >= MAX_LINKS)
return -EPROTONOSUPPORT;
...
This is how your MAX_LINKS is using.
Later, when to actually create socket it invokes __netlink_create, which in turn calls sk_alloc, which in turn calls sk_prot_alloc. Now, in sk_prot_alloc it allocates socket by kmallocing (netlink doesn't have its own slab cache):
slab = prot->slab;
if (slab != NULL) {
sk = kmem_cache_alloc(slab, priority & ~__GFP_ZERO);
if (!sk)
return sk;
if (priority & __GFP_ZERO) {
if (prot->clear_sk)
prot->clear_sk(sk, prot->obj_size);
else
sk_prot_clear_nulls(sk, prot->obj_size);
}
} else
sk = kmalloc(prot->obj_size, priority);
I'm using an Atmel SAM3S MCU, and their ASF stuff can do I2C (they call it TWI) communications. That's fine, except it's taking too much time from my main loop.
So, I'd like to be able to spark off a DMA transfer to read the data from the I2C device. However, all the docs say you can't turn on TX and RX simultaneously on a half-duplex device like TWI. The docs do show that it has a Peripheral DMA Controller (PDC) register section in the TWI registers, but I can't find any PDC examples, except for the USART, which is full duplex.
The only thing I can think of to try is to set TX section, and the next-RX section, and hope that it automatically enables RX after the TX is done.
Has anyone out there used DMA for an I2C read on the SAM3S? If so, could you point me to some docs or examples?
I'm not familiar with the particular part, however I would suggest that for many common usage patterns your best bet would probably be to only use DMA to handle multi-byte sequences of data. Most I2C peripherals allow data to be read out by performing a start with a "write" address byte, and, if that is acknowledged, sending out an address or other information about what data is desired. This is followed by a restart and a "read" address byte. If that is acknowledged, one may then perform all but one of the byte reads with the "ack" flag set. When that is finished, ask for the final byte to be read with the "ack" flag clear.
I'm not sure whether it would be worthwhile to use the DMA controller to clock out the bytes of the requested address, but probably not worthwhile to try to use it to clock out the first byte of the read command.