How can I get a callback when there is some data to read on a boost.asio stream without reading it into a buffer? - sockets

It seems that since boost 1.40.0 there has been a change to the way that the the async_read_some() call works.
Previously, you could pass in a null_buffer and you would get a callback when there was data to read, but without the framework reading the data into any buffer (because there wasn't one!). This basically allowed you to write code that acted like a select() call, where you would be told when your socket had some data on it.
In the new code the behaviour has been changed to work in the following way:
If the total size of all buffers in the sequence mb is 0, the asynchronous read operation shall complete immediately and pass 0 as the argument to the handler that specifies the number of bytes read.
This means that my old (and incidentally, the method shown in this official example) way of detecting data on the socket no longer works. The problem for me is that I need a way detecting this because I've layered my own streaming classes on-top of the asio socket streams and as such, I cannot just read data off the sockets that my streams will expect to be there. The only workaround I can think of right now is to read a single byte, store it and when my stream classes then request some bytes, return that byte if one is set: not pretty.
Does anyone know of a better way to implement this kind of behaviour under the latest boost.asio code?

My quick test with an official example with boost-1.41 works... So I think it still should work (if you use null_buffers)


Why would one need to use `MSG_WAITALL` FLAG instead of `0` FLAG? Why to use it with UDP?

At some point when coding sockets one will face the receive-family of functions (recv, recvfrom, recvmsg).
This function accepts a FLAG argument, in which I see that the MSG_WAITALL is used in many examples on the web, such as this example on UDP.
Here is a definition of the MSG_WAITALL flag
MSG_WAITALL (since Linux 2.2)
This flag requests that the operation block until the full request is satisfied. However, the call may still return less data than requested if a signal is caught, an error or disconnect occurs, or the next data to be received is of a different type than that returned. This flag has no effect for datagram sockets.
Hence, my two questions:
Why would one need to use MSG_WAITALL FLAG instead of 0 FLAG? (Could someone explain a scenario of a problem for which the use of this would be the solution?)
Why to use it with UDP?
As the quoted man page mentions, MSG_WAITALL has no effect on UDP sockets, so there's no reason to use it there. Examples that do use it are probably confused and/or the result of several generations of cargo-cult/copy-and-paste programming. :)
For TCP, OTOH, the default behavior of recv() is to block until at least one byte of data can be copied into the user's buffer from the sockets incoming-data-buffer. The TCP stack will try to provide as many bytes of data as it can, of course, but in a case where the socket's incoming-data-buffer contains fewer bytes of data than the user has passed in to recv(), the TCP stack will copy as many bytes as it can, and return the byte-count indicating how many bytes it actually provided.
However, some people find would prefer to have their recv() call keep blocking until all of the bytes in their passed-in array have been filled in, regardless of how long that might take. For those people, the MSG_WAITALL flag provides a simple way to obtain that behavior. (The flag is not strictly necessary, since the programmer could always emulate that behavior by writing a while() loop that calls recv() multiple times as necessary, until all the bytes in the buffer have been populated... but it's provided as a convenience nonetheless)

How can I get the remote address from an incoming message on UDP listener socket?

Although it's possible to read from a Gio.Socket by wrapping it's file-descriptor in Gio.DataInputStream, using Gio.Socket.receive_from() in GJS to receive is not possible because as commented here:
GJS will clone array arguments before passing them to the C-code which will make the call to Socket.receive_from work and return the number of bytes received as well as the source of the packet. The buffer content will be unchanged as buffer actually read into is a freed clone.
Thus, input arguments are cloned and data will be written to the cloned buffer, not the instance of buffer actually passed in.
Although reading from a data stream is not a problem, Gio.Socket.receive_from() is the only way I can find to get the remote address from a UDP listener, since Gio.Socket.remote_address will be undefined. Unfortunately as the docs say for Gio.Socket.receive():
For G_SOCKET_TYPE_DATAGRAM [...] If the received message is too large to fit in buffer, then the data beyond size bytes will be discarded, without any explicit indication that this has occurred.
So if I try something like Gio.Socket.receive_from(new Uint8Array(0), null); just to get the address, the packet is swallowed, but if I read via the file-descriptor I can't tell where the message came from. Is there another non-destructive way to get the incoming address for a packet?
Since you’re using a datagram socket, it should be possible to use Gio.Socket.receive_message() and pass the Gio.SocketMsgFlags.PEEK flag to it. This isn’t possible for a stream-based socket, but you are not going to want the sender address for each read you do in that case.
If you want improved performance, you may be able to use Gio.Socket.receive_messages(), although I am not sure whether that’s completely introspectable at the moment.

How to serialize/deserialize objects sent over the network in Haskell?

I see that there are many ways to serialize/deserialize Haskell objects:
Data.Serialize -> encode, decode functions
MsgPack, JSON, BSON, etc
In my application, I want to setup a simple TCP client-server, where client may send serialized Haskell record objects. How does one decide between these serialization alternatives?
Additionally, when objects serialized into strings are sent over the network using Network.Socket, strings are returned. Is there a slightly higher level library, that works at the level of whole TCP messages? In other words, is there a way to avoid writing parsing code on the receive end that:
collects results of a sequence of recv() calls,
detect that a whole object has been received, and
then parse it into a haskell type?
In my application, the objects are not expected to be too large (maybe about ~1MB max).
As for the second part of your question, two things are required:
An incremental parser that doesn't need to have the whole document in memory to start parsing, and which can be fed with the partial chunks of data arriving from the wire. Also, when the parsing succeeds it must return any "leftover data" along with the parsed value.
A source of data with "pushback capabilities", that allows you to "unread" any leftovers so that they are available to the next parsing attempt.
The most popular library providing (1) is attoparsec. As for (2), all the three main streaming libraries (conduit, io-streams, and pipes) offer some kind of pushback functionality (the latter using the auxiliary pipes-parse package). All three libraries can integrate with attoparsec parsers as well (see here, here and here).
(Another option, of course, is to prepend each message with its lenght are read only the exact number of bytes.)
To answer the first part of your question (about data serialization), I would say that everything you listed sounds fine. Since you are dealing with pretty big (1MB) serializations, I think that the most important thing is laziness. There is another serialization library, called cereal that has strict serializations, and you wouldn't want that because you'd need to build it up in memory before sending in out. I'll give a shout out to aeson ( which you can use GHC Generics with to get something simple like this:
data Shape = Rect Int Int | Circle Double | Other String Int
deriving (Generic)
instance FromJSON Shape -- uses a default
instance ToJSON Shape -- uses a default
And then, bam!, you've got access to the encode and decode methods. I don't know about a higher level TCP library. Hopefully, someone else will have more insight on that.

Remove read topic from DDS

I have a problem with subscribing the data (using the java platform). When a subscriber subscribes to a topic, that subscribed data must be removed from the DDS. But in my case whenever I subscribe to the data the same data is subscribed many times. The data is not removed from the DDS. I tried with QoS but I don't know how to use it.
Please suggest how I can remove the read data from the DDS.
This behavior is not caused by your QoS settings, but by your method of accessing the DataReader. When you retrieve your data, you are probably calling something like the following read() in a loop:
dataSeq, infoSeq, 10,
The read() method invoked like this will return all currently available samples in your FooReader. After the read(), those samples still remain available in the FooReader, that is how the read() method behaves. Think of a read as a "peek". The next time that you invoke the read() method in this way, you will see all samples that you saw before, unless they have been overwritten by a new update from a DataWriter.
To resolve your issue, you could replace the read() with a take(), like this:
dataSeq, infoSeq, 10,
The take() method is different from the read() method in that it does a destructive read; it not only reads the data but also removes it from FooReader. That way, you will never receive the same sample twice. In fact, if you consistently use take() as opposed to read(), you will never be able to see any sample twice.
Another way to resolve your issue is to stick with read(), but adjust the requested SAMPLE_STATE, from ANY to NOT_READ, like this:
dataSeq, infoSeq, 10,
That way, you will only read samples that you have not read previously. The difference with take() in this case is that the data does remain available in your FooReader, which might be useful if you want to re-read it at a later stage (in which case you need to use the ANY sample state as opposed to NOT_READ to obtain previously read samples).

An IOCP documentation interpretation question - buffer ownership ambiguity

Since I'm not a native English speaker I might be missing something so maybe someone here knows better than me.
Taken from WSASend's doumentation at MSDN:
lpBuffers [in]
A pointer to an array of WSABUF
structures. Each WSABUF structure
contains a pointer to a buffer and the
length, in bytes, of the buffer. For a
Winsock application, once the WSASend
function is called, the system owns
these buffers and the application may
not access them. This array must
remain valid for the duration of the
send operation.
Ok, can you see the bold text? That's the unclear spot!
I can think of two translations for this line (might be something else, you name it):
Translation 1 - "buffers" refers to the OVERLAPPED structure that I pass this function when calling it. I may reuse the object again only when getting a completion notification about it.
Translation 2 - "buffers" refer to the actual buffers, those with the data I'm sending. If the WSABUF object points to one buffer, then I cannot touch this buffer until the operation is complete.
Can anyone tell what's the right interpretation to that line?
And..... If the answer is the second one - how would you resolve it?
Because to me it implies that for each and every data/buffer I'm sending I must retain a copy of it at the sender side - thus having MANY "pending" buffers (in different sizes) on an high traffic application, which really going to hurt "scalability".
Statement 1:
In addition to the above paragraph (the "And...."), I thought that IOCP copies the data to-be-sent to it's own buffer and sends from there, unless you set SO_SNDBUF to zero.
Statement 2:
I use stack-allocated buffers (you know, something like char cBuff[1024]; at the function body - if the translation to the main question is the second option (i.e buffers must stay as they are until the send is complete), then... that really screws things up big-time! Can you think of a way to resolve it? (I know, I asked it in other words above).
The answer is that the overlapped structure and the data buffer itself cannot be reused or released until the completion for the operation occurs.
This is because the operation is completed asynchronously so even if the data is eventually copied into operating system owned buffers in the TCP/IP stack that may not occur until some time in the future and you're notified of when by the write completion occurring. Note that with write completions these may be delayed for a surprising amount of time if you're sending without explicit flow control and relying on the the TCP stack to do flow control for you (see here: some OVERLAPS using WSASend not returning in a timely manner using GetQueuedCompletionStatus?) ...
You can't use stack allocated buffers unless you place an event in the overlapped structure and block on it until the async operation completes; there's not a lot of point in doing that as you add complexity over a normal blocking call and you don't gain a great deal by issuing the call async and then waiting on it.
In my IOCP server framework (which you can get for free from here) I use dynamically allocated buffers which include the OVERLAPPED structure and which are reference counted. This means that the cleanup (in my case they're returned to a pool for reuse) happens when the completion occurs and the reference is released. It also means that you can choose to continue to use the buffer after the operation and the cleanup is still simple.
See also here: I/O Completion Port, How to free Per Socket Context and Per I/O Context?