Non-atomic message sent over Unix Domain socket with file desriptor. Is FD sent twice? - sockets

I am developing a client server application where the client application sends different types of messages to server. One type message consists of file descriptor that is to be passed between processes.
Generally on Posix API pages, not much information is found about sendmsg and recvmsg. My question is if the sent message is too big that cannot be sent atomically, will the attached file descriptor be sent for each pieces of the message, or just first one?
Why this confuses me is that on connected sockets, if messages are sent too quickly, kernel is merging messages to each other, then file descriptors (integer number) must be merged aligned with messages as well.

UNIX domain sockets support passing file descriptors or process credentials to ... The send(2) MSG_MORE flag is not supported by UNIX domain sockets. ... For historical reasons the ancillary message types listed below are specified with ... To pass file descriptors or credentials over a SOCK_STREAM

Related

How to Enable Timestamp Option in IP Header

I am designing an application layer protocol on top of UDP. One of requirements is that the receiving side should keep only the most up to date datagram.
Therefore, if datagram A was sent and then datagram B was sent, but datagram B was received first, datagram A should be discarded by the application when received.
One way to implement this is a counter stored in the data part of the UDP packet. The counter is incremented each time a datagram is sent.
I also noticed that IP options contain a timestamp option which looks suitable for this task.
My questions are (in the context of BSD-like sockets):
How do I enable this option on the sending side?
How do I read this field on the receiving side?
You can set IP options using setsockopt() using option level IPPROTO_IP and specifying the name of the option. See Unix/Linux IP documentation, for example see here. Reading IP header options generally requires using a RAW socket which in turn usually requires root permissions. It's not advisable to (try to) use IP options because it may not always be supported since it's very rarely used (either at the origination system or at systems it passes).

What methods are available on unix for pub sub IPC?

There are various options for IPC.
Over a network:
for client-server, can use TCP
for pub sub, can use UDP multicast
Locally:
for client-server, can use unix domain sockets
for pub sub, can use ???
I suppose what I'd be interested in is some kind of file descriptor that supports many readers (subscribers) and many writers (publishers) simultaneously. Is this usage pattern feasible/efficient on unix?
After much googling I haven't found a whole lot in the way of ipc multicast, so I have decided to write a program pubsub that takes as arguments a publisher address and a subscriber address, listens and accepts connections on these 2 addresses, and then for each payload received on a publisher connection write it to each of the subscriber connections. It wouldn't surprise me if this is inefficient or reinventing the wheel but I have not come across a better solution.
I was looking for solutions to a similar problem and found /dev/fanout. Fanout is a kernel module that replicates its input out to all processes reading from it. You can think of it as IPC Broadcast mechanism. Works well for small data payloads according to the author. Multiple processes can write to the device and multiple processes can read from it. I am not sure of atomicity of writes though. Small writes from multiple processes should occur atomically as with FIFOs, etc.
More about Fanout:
http://compgroups.net/comp.linux.development.system/-dev-fanout-a-one-to-many-multi/2869739
http://www.linuxtoys.org/fanout/fanout.html
There are Posix message queues too. As man mq_overview puts it:
POSIX message queues allow processes to exchange data in the form of messages. This API is distinct from that provided by
System V message queues (msgget(2), msgsnd(2), msgrcv(2), etc.), but provides similar functionality.
Message queues are created and opened using mq_open(3); this function returns a message queue descriptor (mqd_t), which is
used to refer to the open message queue in later calls. Each message queue is identified by a name of the form /somename;
that is, a null-terminated string of up to NAME_MAX (i.e., 255) characters consisting of an initial slash, followed by one
or more characters, none of which are slashes. Two processes can operate on the same queue by passing the same name to
mq_open(3).
Messages are transferred to and from a queue using mq_send(3) and mq_receive(3). When a process has finished using the queue, it closes it using mq_close(3), and when the queue is no longer required, it can be deleted using mq_unlink(3).
Queue attributes can be retrieved and (in some cases) modified using mq_getattr(3) and mq_setattr(3). A process can request asynchronous notification of the arrival of a message on a previously empty queue using mq_notify(3).
A message queue descriptor is a reference to an open message queue description (cf. open(2)). After a fork(2), a child inherits copies of its parent's message queue descriptors, and these descriptors refer to the same open message queue descriptions as the corresponding descriptors in the parent. Corresponding descriptors in the two processes share the flags (mq_flags) that are associated with the open message queue description.
Each message has an associated priority, and messages are always delivered to the receiving process highest priority first.
Message priorities range from 0 (low) to sysconf(_SC_MQ_PRIO_MAX) - 1 (high). On Linux, sysconf(_SC_MQ_PRIO_MAX) returns 32768, but POSIX.1 requires only that an implementation support at least priorities in the range 0 to 31; some implementations provide only this range.
A more friendly introduction by Michael Kerrisk is available here: http://man7.org/conf/lca2013/IPC_Overview-LCA-2013-printable.pdf

Will select (or epoll) mark a socket as readable if there is data on the socket prior to adding the socket to the list of monitored file descriptor

I’m seeking help to understand the following situation:
1. I have a TCP socket connection established the peer.
2. I added to the list of file descriptors to be monitored by select
3. select alerts me to any activity on the socket and my application processes the data sent by the peer on the socket.
4. I now remove the file descriptor associated with the socket from the list of file descriptors to be monitored by select.
5. Peer sends me some data on that socket. I do not read that data.
6. after a few seconds, I again and the file descriptor associated with the socket to the list of file descriptors to be monitored by select
7. will select now immediately let me know that the socket is readable? What, if in step 5, the peer does not send me any data but say, since a FIN. Will select still tell me that the socket is readable
in summary, the question is if select(or any of its variants such as epoll) indicate that a socket is readable if there has been any activity on the socket prior to including the socket in the list of monitored file descriptors and assuming that the application has read no data from the socket
will select now immediately let me know that the socket is readable?
Yes.
What, if in step 5, the peer does not send me any data but say, since a FIN. Will select still tell me that the socket is readable
Yes.
in summary, the question is if select(or any of its variants such as epoll) indicate that a socket is readable if there has been any activity on the socket prior to including the socket in the list of monitored file descriptors and assuming that the application has read no data from the socket
Yes.
NB 'Prior to including the socket in the list of monitored file descriptors' doesn't really mean anything. The operating system doesn't know when you did that. It only knows that you called select() with that list.

Sending And Receiving Sockets (TCP/IP)

I know that it is possible that multiple packets would be stacked to the buffer to be read from and that a long packet might require a loop of multiple send attempts to be fully sent. But I have a question about packaging in these cases:
If I call recv (or any alternative (low-level) function) when there are multiple packets awaiting to be read, would it return them all stacked into my buffer or only one of them (or part of the first one if my buffer is insufficient)?
If I send a long packet which requires multiple iterations to be sent fully, does it count as a single packet or multiple packets? It's basically a question whether it marks that the package sent is not full?
These questions came to my mind when I thought about web sockets packaging. Special characters are used to mark the beginning and end of a packet which sorta leads to a conclusion that it's not possible to separate multiple packages.
P.S. All the questions are about TCP/IP but you are welcomed to share information (answers) about UDP as well.
TCP sockets are stream based. The order is guaranteed but the number of bytes you receive with each recv/read could be any chunk of the pending bytes from the sender. You can layer a message based transport on top of TCP by adding framing information to indicate the way that the payload should be chunked into messages. This is what WebSockets does. Each WebSocket message/frame starts with at least 2 bytes of header information which contains the length of the payload to follow. This allows the receiver to wait for and re-assemble complete messages.
For example, libraries/interfaces that implement the standard Websocket API or a similar API (such as a browser), the onmessage event will fire once for each message received and the data attribute of the event will contain the entire message.
Note that in the older Hixie version of WebSockets, each frame was started with '\x00' and terminated with '\xff'. The current standardized IETF 6455 (HyBi) version of the protocol uses the header information that contains the length which allows much easier processing of the frames (but note that both the old and new are still message based and have basically the same API).
TCP connection provides for stream of bytes, so treat it as such. No application message boundaries are preserved - one send can correspond to multiple receives and the other way around. You need loops on both sides.
UDP, on the other hand, is datagram (i.e. message) based. Here one read will always dequeue single datagram (unless you mess with low-level flags on the socket). Event if your application buffer is smaller then the pending datagram and you read only a part of it, the rest of it is lost. The way around it is to limit the size of datagrams you send to something bellow the normal MTU of 1500 (less IP and UDP headers, so actually 1472).

Sockets Asyn Connection

I am new to Async Socket Connection. Can you please explain. How does this technology work.
There's an existing application (server) which requires socket connections to transmit data back and forward. I already create my application (.NET) but the Server application doesn't seem to understand the XML data that I am sending. My documentation is giving me two ports one to Send and another one to Receive.
I need to be sure that I understand how this works.
I got the IP addresses and also the two Ports to be used.
A socket is the most "raw" way you can use to send byte-level TCP and UDP packets across a network.
For example, your browser uses a socket TCP connection to connect to the StackOverflow web server on port 80. Your browser and the server exchange commands and data according to an agreed-on structure/protocol (in this case, HTTP). An asynchronous socket is no different than a synchronous socket except that is does not block the thread that's using it.
This is really not the most ideal way to work (check and see if your server/vendor application supports SOAP/Web Services, etc), but if this is really the only way, there could be a number of reasons why it's failing. To name a few...
Not actually getting connected or sending data. Run a test using WinsockTool (http://www.isatools.org/tools/winsocktool.msi) and simulate your client first to make sure the server is working as expected.
Encoding incorrect - You're sending raw bytes across the network... Make sure you're using the correct encoding to convert your XML into bytes (ASCII, UTF8, etc).
Buffer Length - Your sending buffer (the amount of data you can transmit in one shot) may be too small or the server may expect a content of a certain length, and your XML could be getting truncated.
let's break a misconception... sockets are FULL-DUPLEX: you connect to a server using one port, then you can send AND receive data through the same socket, no need for 2 port numbers. (actually, there is a port assigned for receiving data, but it is: 1. assigned automatically when creating the socket (unless told so) and 2. of no use in the function calls to receive data)
so you tell us that your documentation give you 2 port numbers... i assume that the "server" is an already existing in-house application, and you are trying to talk to it. if the doc lists 2 ports, then you will need 2 sockets: one for sending and another one for receiving. now i would suggest you first use a synchronous socket before trying the async way: a synchronous socket is less error-prone for a first test.
(by the way, let's break another misconception: if well coded, once a server listen on a port, it can receive any number of connection through the same port number, no need to open 2 listening ports to accept 2 connections... sorry for the re-alignment, but i've seen those 2 errors committed enough time, it gives me a urge to kill)