Handling concurrent UDP DatagramReceivedFcn executions in Matlab - matlab

I'm attempting to read ocean depth values at multiple frequencies which are being broadcast via UDP packets. What I'm doing is telling the logging program to return the depth values to a specific UDP port, then use the DatagramReceivedFcn to run a function when data is received and essentially save that depth.
u1 = udp(remoteip,dataport18,'ByteOrder','littleEndian','LocalPort',dataport18,'DatagramTerminateMode','off');
set(u1,'InputBufferSize',6000);
u1.DatagramReceivedFcn = {#receivedata18};
fopen(u1);
Thus, when data is received on the port specified in 'dataport18', it will run the function receivedata18(). However, I'm trying to read depth data for multiple frequencies, so I create additional UDP objects:
u2 = udp(remoteip,dataport38,'ByteOrder','littleEndian','LocalPort',dataport38,'DatagramTerminateMode','off');
set(u2,'InputBufferSize',6000);
u2.DatagramReceivedFcn = {#receivedata38};
fopen(u2);
What I'm finding though is that only data for u1 (18 kHz) is being saved. My guess is that since both frequencies ping at the same time, they both send their UDP packets and try to evaluate their respective functions at the same time, which Matlab is not capable of doing.
Is this indeed what is going on? If so, is there any way around this issue so that I can concurrently read depth data that is being sent at the same time from two separate UDP packets?
Thanks!
Update
I'm wondering if I would need the Parallel Computing Toolbox in order to perform this. I have a similar program in Python that is performed in essentially the same way, however it has no issues. I'm assuming that it must be that Matlab can't run simultaneous functions without the Parallel Computing Toolbox

Thought I should update this in case anyone cares. It's not really an answer to my question, but what I'm currently doing that's working.
Instead of having the data sent to different UDP port, I simply have them sent to the same port and then read them sequentially. Thus I'm not reading them synchronously, although that doesn't really slow down the operation much at all.

Related

Omnetpp application sends multiple streams

Let's say I have a car with different sensors: several cameras, LIDAR and so on, the data from this sensors are going to be send to some host over 5G network (omnetpp + inet + simu5g). For video it is like 5000 packets 1400 bytes each, for lidar 7500 packets 1240 bytes and so on. Each flow is encoded in UDP packets.
So in omnetpp module in handleMessage method I have two sentTo calls, each is scheduled "as soon as possible", i.e., with no delay - that corresponds to the idea of multiple parallel streaming. How does omnetpp handle situations, when it needs to send two different packets at the same time from the same module to the same module (some client, which receives sensor data streams)? Does it create some inner buffer on the sender or receiver side, therefore allowing really only one packet sending per handleMessage call or is it wrong? I want to optimize data transmission and play with packet sizes and maybe with sending intervals, so I want to know, how omnetpp handles multiple streaming at the same time, because if it actually buffers, maybe than it makes sense to form a single package from multiple streams, each such package will consist of a certain amount of data from each stream.
There is some confusion here that needs to be clarified first:
OMNeT++ is a discrete event simulator framework. An OMNeT++ model contains modules that communicate with each other, using OMNeT++ API calls like sendTo() and handleMessage(). Any call of the sendTo() method just queues the provided message into the future event queue (an internal, time ordered queue). So if you send more than one packet in a single handleMessage() method, they will be queued in that order. The packets will be delivered one by one to the requested destination modules when the requested simulation time is reached. So you can send as many packets as you wish and those packets will be delivered one by one to the destination's handleMessage() method. But beware! Even if the different packets will be delivered one by one sequentially in the program's logic, they can still be delivered simultaneously considering the simulation time. There are two time concepts here: real-time that describes the execution order of the code and simulation-time which describes the time passes from the point of the simulated system. That's why, while OMNeT++ is a single threaded application that runs each events sequentially it still can simulate infinite number of parallel running systems.
BUT:
You are not modeling directly with OMNeT++ modules, but rather using INET Framework which is a model directly created to simulate internet protocols and networks. INET's core entity is a node which is something that has network interface(s) (and queues belonging to them). Transmission between nodes are properly modeled and only a single packet can travel on an ethernet line at a time. Other packets must queue in the network interface queue and wait for an opportunity to be delivered from there.
This is actually the core of the problem for Time Sensitive Networks: given a lot of pre-defined data streams in a network, how the various packets interfere and affect each other and how they change the delay and jitter statistics of various streams at the destination, Plus, how you can configure the source and network gate scheduling to achieve some desired upper bounds on those statistics.
The INET master branch (to be released as INET 4.4) contains a lot TSN code, so I highly recommend to try to use it if you want to model in vehicle networks.
If you are not interested in the in-vehicle communication, bit rather want to stream some data over 5G, then TSN is not your interest, but you should NOT start to multiplex/demultiplex data streams at application level. The communication layers below your UDP application will fragment/defragment and queue the packets exactly how it is done in the real world. You will not gain anything by doing mux/demux at application layer.

How to receive image over a socket in Dart

I am not able to retrive file through sockets, it divides the byte array if data increases than a certain length. How can I resolve this issue?
Sockets i/o (and especially when using UDP) is a low level communication protocol that gives you a lot of power but also requires you to solve problems like packets getting chopped up, getting dropped or arriving out of order. You can figure that all out yourself, but you may also want to look at using a package like socket.io-client-dart that takes care of all those edge cases.

How exactly do socket receives work at a lower level (eg. socket.recv(1024))?

I've read many stack overflow questions similar to this, but I don't think any of the answers really satisfied my curiosity. I have an example below which I would like to get some clarification.
Suppose the client is blocking on socket.recv(1024):
socket.recv(1024)
print("Received")
Also, suppose I have a server sending 600 bytes to the client. Let us assume that these 600 bytes are broken into 4 small packets (of 150 bytes each) and sent over the network. Now suppose the packets reach the client at different timings with a difference of 0.0001 seconds (eg. one packet arrives at 12.00.0001pm and another packet arrives at 12.00.0002pm, and so on..).
How does socket.recv(1024) decide when to return execution to the program and allow the print() function to execute? Does it return execution immediately after receiving the 1st packet of 150 bytes? Or does it wait for some arbitrary amount of time (eg. 1 second, for which by then all packets would have arrived)? If so, how long is this "arbitrary amount of time"? Who determines it?
Well, that will depend on many things, including the OS and the speed of the network interface. For a 100 gigabit interface, the 100us is "forever," but for a 10 mbit interface, you can't even transmit the packets that fast. So I won't pay too much attention to the exact timing you specified.
Back in the day when TCP was being designed, networks were slow and CPUs were weak. Among the flags in the TCP header is the "Push" flag to signal that the payload should be immediately delivered to the application. So if we hop into the Waybak
machine the answer would have been something like it depends on whether or not the PSH flag is set in the packets. However, there is generally no user space API to control whether or not the flag is set. Generally what would happen is that for a single write that gets broken into several packets, the final packet would have the PSH flag set. So the answer for a slow network and weakling CPU might be that if it was a single write, the application would likely receive the 600 bytes. You might then think that using four separate writes would result in four separate reads of 150 bytes, but after the introduction of Nagle's algorithm the data from the second to fourth writes might well be sent in a single packet unless Nagle's algorithm was disabled with the TCP_NODELAY socket option, since Nagle's algorithm will wait for the ACK of the first packet before sending anything less than a full frame.
If we return from our trip in the Waybak machine to the modern age where 100 Gigabit interfaces and 24 core machines are common, our problems are very different and you will have a hard time finding an explicit check for the PSH flag being set in the Linux kernel. What is driving the design of the receive side is that networks are getting way faster while the packet size/MTU has been largely fixed and CPU speed is flatlining but cores are abundant. Reducing per packet overhead (including hardware interrupts) and distributing the packets efficiently across multiple cores is imperative. At the same time it is imperative to get the data from that 100+ Gigabit firehose up to the application ASAP. One hundred microseconds of data on such a nic is a considerable amount of data to be holding onto for no reason.
I think one of the reasons that there are so many questions of the form "What the heck does receive do?" is that it can be difficult to wrap your head around what is a thoroughly asynchronous process, wheres the send side has a more familiar control flow where it is much easier to trace the flow of packets to the NIC and where we are in full control of when a packet will be sent. On the receive side packets just arrive when they want to.
Let's assume that a TCP connection has been set up and is idle, there is no missing or unacknowledged data, the reader is blocked on recv, and the reader is running a fresh version of the Linux kernel. And then a writer writes 150 bytes to the socket and the 150 bytes gets transmitted in a single packet. On arrival at the NIC, the packet will be copied by DMA into a ring buffer, and, if interrupts are enabled, it will raise a hardware interrupt to let the driver know there is fresh data in the ring buffer. The driver, which desires to return from the hardware interrupt in as few cycles as possible, disables hardware interrupts, starts a soft IRQ poll loop if necessary, and returns from the interrupt. Incoming data from the NIC will now be processed in the poll loop until there is no more data to be read from the NIC, at which point it will re-enable the hardware interrupt. The general purpose of this design is to reduce the hardware interrupt rate from a high speed NIC.
Now here is where things get a little weird, especially if you have been looking at nice clean diagrams of the OSI model where higher levels of the stack fit cleanly on top of each other. Oh no, my friend, the real world is far more complicated than that. That NIC that you might have been thinking of as a straightforward layer 2 device, for example, knows how to direct packets from the same TCP flow to the same CPU/ring buffer. It also knows how to coalesce adjacent TCP packets into larger packets (although this capability is not used by Linux and is instead done in software). If you have ever looked at a network capture and seen a jumbo frame and scratched your head because you sure thought the MTU was 1500, this is because this processing is at such a low level it occurs before netfilter can get its hands on the packet. This packet coalescing is part of a capability known as receive offloading, and in particular lets assume that your NIC/driver has generic receive offload (GRO) enabled (which is not the only possible flavor of receive offloading), the purpose of which is to reduce the per packet overhead from your firehose NIC by reducing the number of packets that flow through the system.
So what happens next is that the poll loop keeps pulling packets off of the ring buffer (as long as more data is coming in) and handing it off to GRO to consolidate if it can, and then it gets handed off to the protocol layer. As best I know, the Linux TCP/IP stack is just trying to get the data up to the application as quickly as it can, so I think your question boils down to "Will GRO do any consolidation on my 4 packets, and are there any knobs I can turn that affect this?"
Well, the first thing you can do is disable any form of receive offloading (e.g. via ethtool), which I think should get you 4 reads of 150 bytes for 4 packets arriving like this in order, but I'm prepared to be told I have overlooked another reason why the Linux TCP/IP stack won't send such data straight to the application if the application is blocked on a read as in your example.
The other knob you have if GRO is enabled is GRO_FLUSH_TIMEOUT which is a per NIC timeout in nanoseconds which can be (and I think defaults to) 0. If it is 0, I think your packets may get consolidated (there are many details here including the value of MAX_GRO_SKBS) if they arrive while the soft IRQ poll loop for the NIC is still active, which in turn depends on many things unrelated to your four packets in your TCP flow. If non-zero, they may get consolidated if they arrive within GRO_FLUSH_TIMEOUT nanoseconds, though to be honest I don't know if this interval could span more than one instantiation of a poll loop for the NIC.
There is a nice writeup on the Linux kernel receive side here which can help guide you through the implementation.
A normal blocking receive on a TCP connection returns as soon as there is at least one byte to return to the caller. If the caller would like to receive more bytes, they can simply call the receive function again.

Simulink: Introduce delay with UDP Send/Receive

I'm building a client/server-type subsystem in a control system application using UDP Send/Receive blocks in Simulink. Data x is sent to the server via UDPSend block which is then processed at the server that returns output y.
Currently, I've both the client (a Simulink model) and the server (processing logic return in Java) resides in the localhost. Therefore, the packet exchanges essentially take near-zero time. I'd like to introduce network delay such that the packet exchanges take a varying amount of time (say due to changes in bandwidth availability), effectively simulating a scenario where the server node is located in a different geographical location.
Could someone guide me on how to achieve this? Thanks.
As a general (Simulink-independent) solution in a Windows environment, you should have a look at following tool, which "makes your network condition significantly worse, but in a managed and interactive manner."

Implement a good performing "to-send" queue with TCP

In order not to flood the remote endpoint my server app will have to implement a "to-send" queue of packets I wish to send.
I use Windows Winsock, I/O Completion Ports.
So, I know that when my code calls "socket->send(.....)" my custom "send()" function will check to see if a data is already "on the wire" (towards that socket).
If a data is indeed on the wire it will simply queue the data to be sent later.
If no data is on the wire it will call WSASend() to really send the data.
So far everything is nice.
Now, the size of the data I'm going to send is unpredictable, so I break it into smaller chunks (say 64 bytes) in order not to waste memory for small packets, and queue/send these small chunks.
When a "write-done" completion status is given by IOCP regarding the packet I've sent, I send the next packet in the queue.
That's the problem; The speed is awfully low.
I'm actually getting, and it's on a local connection (127.0.0.1) speeds like 200kb/s.
So, I know I'll have to call WSASend() with seveal chunks (array of WSABUF objects), and that will give much better performance, but, how much will I send at once?
Is there a recommended size of bytes? I'm sure the answer is specific to my needs, yet I'm also sure there is some "general" point to start with.
Is there any other, better, way to do this?
Of course you only need to resort to providing your own queue if you are trying to send data faster than the peer can process it (either due to link speed or the speed that the peer can read and process the data). Then you only need to resort to your own data queue if you want to control the amount of system resources being used. If you only have a few connections then it is likely that this is all unnecessary, if you have 1000s then it's something that you need to be concerned about. The main thing to realise here is that if you use ANY of the asynchronous network send APIs on Windows, managed or unmanaged, then you are handing control over the lifetime of your send buffers to the receiving application and the network. See here for more details.
And once you have decided that you DO need to bother with this you then don't always need to bother, if the peer can process the data faster than you can produce it then there's no need to slow things down by queuing on the sender. You'll see that you need to queue data because your write completions will begin to take longer as the overlapped writes that you issue cannot complete due to the TCP stack being unable to send any more data due to flow control issues (see http://www.tcpipguide.com/free/t_TCPWindowSizeAdjustmentandFlowControl.htm). At this point you are potentially using an unconstrained amount of limited system resources (both non-paged pool memory and the number of memory pages that can be locked are limited and (as far as I know) both are used by pending socket writes)...
Anyway, enough of that... I assume you already have achieved good throughput before you added your send queue? To achieve maximum performance you probably need to set the TCP window size to something larger than the default (see http://msdn.microsoft.com/en-us/library/ms819736.aspx) and post multiple overlapped writes on the connection.
Assuming you already HAVE good throughput then you need to allow a number of pending overlapped writes before you start queuing, this maximises the amount of data that is ready to be sent. Once you have your magic number of pending writes outstanding you can start to queue the data and then send it based on subsequent completions. Of course, as soon as you have ANY data queued all further data must be queued. Make the number configurable and profile to see what works best as a trade off between speed and resources used (i.e. number of concurrent connections that you can maintain).
I tend to queue the whole data buffer that is due to be sent as a single entry in a queue of data buffers, since you're using IOCP it's likely that these data buffers are already reference counted to make it easy to release then when the completions occur and not before and so the queuing process is made simpler as you simply hold a reference to the send buffer whilst the data is in the queue and release it once you've issued a send.
Personally I wouldn't optimise by using scatter/gather writes with multiple WSABUFs until you have the base working and you know that doing so actually improves performance, I doubt that it will if you have enough data already pending; but as always, measure and you will know.
64 bytes is too small.
You may have already seen this but I wrote about the subject here: http://www.lenholgate.com/blog/2008/03/bug-in-timer-queue-code.html though it's possibly too vague for you.