Forward packets between SR-IOV Virtual Function (VF) NICs - virtualization

I have an Intel 82599ES 10G NIC which supports Intel SR-IOV. I have successfully created 8 virtual functions (VF) of it and assigned to 2 qemu/kvm VMs (2 VFs per each VM). Both of the VMs run DPDK applications (warp17 on one and my custom application on other) using assigned VFs. What I need to do is test my custom DPDK application by sending traffic through it using warp17. My test setup looks like this,
The red arrow represents the traffic path.
My Physical NIC (PF) use dpdk poll mode driver (igb_uio). What I need to do is route traffic between VFs as shown by the red arrows. I think https://doc.dpdk.org/guides/prog_guide/switch_representation.html has explained switching behavior but I cannot understand it. warp17 and my custom dpdk application both works perfectly on physical hardware. What I trying to do is virtualize my test setup to preserve resources. Has anyone tried to do such configuration?

neither X710 fortville and Ninatic 82599ES ASIC does not have internal Bridging or forwarding VERBor feature. The best option is to have software virtual switch like SPP, OVS-DPDK or custom application to forward packets via virtio or tap.
if you still want to use physical NIC or x710 or 82599ES you will need to have connection at other end and run the logic to direct packets to relevant VF (modifying dst mac).
Edit-1: (as per DPDK 20.11) VEB virtual ethernet Bridging is an option, but specific NIC firmware and driver is required to create VEB on PF then propagate to VF. Once done the NIC can not receive packets from the Outside world

Related

Is it possible to connect to Modbus TCP via Ethernet?

Is it possible to connect the Ethernet port (of a Raspberry Pi) directly to a Modbus TCP RJ45 port (such that the devices can talk to each other)? Or is this not possible without a converter?
I am unsure if this is the correct forum, but I believe this should not be specific to the Raspberry pi.
Short answer - Yes... But....
As per the comments this is possible but there are a few things you will need to do (i.e. some configuration will be needed).
I think it's worth nothing that "Modbus TCP RJ45 port" is not really a meaningful term. Modbus is an application layer protocol; this depends upon a number of underlying layers:
Transport layer - TCP
Network layer - IP
Datalink Layer - Ethernet
Physical Layer - Ethernet cable with RJ45 connectors
You don't need to understand this in detail; the point is that before ModbusTCP will work you need to have a working TCP network (which all Modbus-TCP devices will support; generally via an RJ45 Ethernet connection). As such a better question probably is "If I run a CAT-5 cable between a Raspberry Pi and another device (Modbus TCP unit) will I be able to connect via TCP?" (a lot more people know about TCP/IP networking than Modbus!).
The first thing to consider is Ethernet. Running a cable directly between two older devices will often not work because they needed a crossover cable. Almost all modern equipment (including the Pi) supports Auto MDI-X which means the cable will just work. You can also connect the units via a switch (and doing this removes the need for Auto MDI-X).
Next you need to consider the IP layer. When you connect your Pi to your home network it will (usually!) be given an IP address by a DHCP service (usually running on your router). If you are connecting the Pi directly to the device then there will be no DHCP service so you will need to manually assign IP addresses to the devices (and ensure the subnet is correctly configured). A common way to check if an IP connection is working is to use the ping command.
With the lower layers working ModbusTCP will generally 'just work'. Many ModbusTCP devices also offer a browser based configuration and checking that you can access that is a good way to confirm that the network link is working.
One further question is probably "should I do this"; it's OK to hook things up this way to make some quick changes. However generally you will want the Pi to access other network resources so connecting everything to a router (home router will work; for remote devices a cell router is often used). You can either give the Modbus unit a static IP manually or use the routers configuration pages to assign it a static DHCP lease (otherwise it's IP might change from time to time).

How to intercept IP packets going to the kernel Linux

I need to create a TCP session "manually", without using the connect() function. I have tried to use RAW sockets. But in this case, I only get copies of the incoming IP packets. The original incoming packets slip through to the kernel and it generates an ACK response packet that damages my protocol.
Next, variant 2, I can write a virtual eth interface driver (kernel module) and route incoming traffic to it using iptables. But there is a patched non-original (non vanila) kernel on the machine. Normal linking of the module with the kernel is not possible.
Variant 3. I also tried not to assign an IP address to the NIC interface. In this case, the network TCP/IP layer module in the kernel is not activated and it is possible to generate and receive arbitrary IP packets on the link (ethernet) layer using the PF_PACKET socket domain type in the socket() function. But at this time, any other applications using the TCP/IP protocol can’t work.
How can this problem be solved in other ways?
It would be nice if it were possible to intercept packets going from the network interface to the kernel, that is, intercept the SKBuf buffer. But I don't know how to realize it.
Apparently you are trying to create a tunnel. Instead of trying to hijack an existing interface, the proper way to create a tunnel is to create a new interface, using a kernel module or TUN/TAP. However, tunnels are normally intended to receive traffic generated on the machine which runs the tunnel software, or at least routed through it. That means you will also have to set up the kernel to route the traffic to your tunnel.
You can create a new interface as a TUN/TAP interface. It is like a virtual ethernet driver except you don't need to write a new kernel module. It is designed for tunnels (hence the name).
The difference between TUN and TAP is that a TUN interface is an IP interface that receives IP packets from the kernel's IP routing system, and a TAP interface receives Ethernet packets (which may contain IP packets) so it can alternatively be part of a bridge (a virtual Ethernet switch - which only looks at the Ethernet header, not the IP header).
I think for your scenario, you will find it easiest to create a TAP interface, then create a bridge (virtual Ethernet switch) between the TAP interface, and the interface which the other host is connected to. Neither one needs an IP address - the kernel will happily pass Ethernet-layer traffic without attempting to process the IP information in the packet. Your tunnel software can then emulate a host - or tunnel to an actual host - or whatever you want it to do.
Or in visual form:
If you want the host to also be able to talk to the machine running the tunnel software - without going through the tunnel software - then you may choose to put an IP address on the bridge.

How to enable RSS on ETH_RSS_VXLAN or ETH_RSS_GENEVE for a DPDK application?

I am trying to find out what are parameters used to calculate RSS hash when tunnelled RSS hash offloads are used, such as ETH_RSS_VXLAN or ETH_RSS_GENEVE. Target is to distribute the incoming VXLAN traffic based on VNI rather than outer IP or UDP port number in DPDK.
DPDK version: 20.11.1
NIC: Mellanox ConnectX-5, firmware version 16.30.1004
I have been testing out the different RSS hash using the inner IP fields to calculate the hash. Settings of setting done: RSS Setting inside the L3FWD application
[EDIT-1 based on comment conversation]
the NIC card I am using does not support the ETH_RSS_VXLAN or other tunnel RSS offloads, I am unable to test it.
I am not making use of RTE_FLOW since I am using sample l3fwd example application.
As per DPDK 20.11.1 and MLX5 PMD, enabling VXLAN based RSS via DPDK rte_dev_configure is not present. because the current PMD support MLX5_EXPANSION_OUTER_IPV4_UDP, MLX5_EXPANSION_OUTER_IPV6_UDP, MLX5_EXPANSION_VXLAN and MLX5_EXPANSION_VXLAN_GPE is for RTE_FLOW for encap and decap VxLAN for the switch. while DPDK rte_eth_dev_configure is generic API for all NIC.
So if the real intention is to distribute packets based on INNER IP, I highly recommend use RTE_FLOW, with flow match as OUTER-IP and UDP port numbers and action as RSS on INNER IP and port Number. But L3FWD application has to be modified for the same.
Other DPDK supported NICs also supports VNI (with varying degree);
NXP Qede
marvell thunderx
Intel FVL and CVL
on Intel FVL and CVL lookup can be done, followed by RSS distribution to specific queues. generic setup can be done via
for FVL identify NIC firmware which supports VXLAN from Intel site for your NIC.
using nvmupdate tool flash FVL with firmware and restart the machine.
Using DPDK application (example: testpmd) create 5 rxqueues
set flow rule as IF traffic is tunneled with VXLAN, then use action 1) RSS for inner IP on Q1 to Q4 action 2) Decap the Tunnel header.
refer sample test for more details.
note: I highly recommend opening DPDK vendor specific question, with details on the sample program, steps followed and issues faced.

UDP packets received only in promiscuous mode

I am generating UDP packets on a 100 multicast groups on one VM Ubuntu 16.04 machine and subscribe to those groups on the other VM Ubuntu 16.04 machine. Both are on a HP server run by Hyper-V manager. The problem is that my application only receives 2 out of 100 groups. However, when Wireshark is capturing, the application starts receiving all messages.
I found several other similar questions like this one, where it explains that because Wireshark is running in promiscuous mode, it allows all packets to get through (through what?), and this explains why my application starts "seeing" them too. Thus, changing the Ethernet interface configuration to promiscuous mode allows the application to receive all the messages without running the Wireshark.
But what is the problem with the other packets that are not normally received? I tried to cross-verify the hex-dump of the "good" and "bad" messages and they don't seem to be different. The check sums for on the IP and UDP levels are correct. What else could be the problem?
Multicast ip range 239.1.4.1-100
Destination port 50003
Source port range ~33000 - 60900
firewall is disabled
EDIT:
It looks like when the application is subscribed to only 8 multicast groups, it works fine, however, if subscribed to more than 8, it receives only 2 (if they end on .7 or .8) or none, as described above. So, I would assume that the packets are correct. Could the problem be in the network settings? Or the application itself - need to find the bug in the script I did not write.
EDIT2:
I installed the ISO image on the other machine (Virtual box instead of HP Windows Server) and it works as it should. Thus, I assume my application works fine and all the ubuntu OS configurations are correct. Now I put all the blame on the Virtual Manager/settings. Any ideas?
It sounds as if you didn't tell the kernel about them.
See http://tldp.org/HOWTO/Multicast-HOWTO-6.html
You have to use setsockopt with IP_ADD_MEMBERSHIP. And be sure to use the correct values for your local interfaces.

Virtualization aware switches

According to http://www.cisco.com/en/US/solutions/collateral/ns340/ns517/ns224/ns892/ns894/white_paper_c11-525307.html
Each virtual machine is given a dedicated network interface card. My question is, how do a server containing about 10 virtual machines, ever support 10 NIC's ?
Those NICs are probably virtual. Packets from them are routed to the physical NIC(s) and the other way around. It's pretty much the same thing as you get in modern WiFi routers: at home you only have one Ethernet port from your Internet Service Provider, it's in the modem. You connect your router to it, but your router may have 2+ Ethernet ports to which you can connect multiple PCs.
They can be physical too and either be directly accessible to VMs or indirectly.