Why can file descriptors under UNIX be transmitted over sockets, but not over pipes? - sockets

I just learned about pipes and sockets today and that sockets are special because they allow you to transmit file descriptors between processes.
I've also looked up that it seems to be sendmsg() and the msghdr structure that are used to produce this behavior.
My professor told me that pipes can't be used to replicate this behavior/feature, but I am interested exactly what part of the implementation allows sockets to do what pipes can't.

Related

pcap - Proper capitalization when referring to the file standard?

How does one properly refer to a Packet Capture file in short hand when writing about it for documentation?
I see a mix between PCAP, PCap and pcap in various areas and wikis.
The proper way to refer to a packet capture file is "a packet capture file"; "pcap"/"PCap"/"pcap" are often used to refer to a particular type of packet capture file, those packet capture files written in the format that libpcap/WinPcap supports for writing. There are several other capture file formats, one of which Wireshark, and libpcap 1.1.0 and later, can read (pcap-ng), and several of which Wireshark can read (and some that Wireshark can't read).
The way I (as a core developer of libpcap, tcpdump, and Wireshark) would say is the proper way to refer to files in the aforementioned format is "pcap", with no extra capitalization; the "pcap" comes from "libpcap", not directly from "packet capture", and "libpcap" is not capitalized (it's a UN*X library, and those tend to have all-lower-case names, given that almost all UN*X file systems are case-sensitive).
Others may call it "PCAP", perhaps because a number of terms in the computer and networking fields are acronyms or other initialisms and they assume "PCAP" must be as well, or call it "PCap", because they think of it as standing for "Packet Capture" rather than referring to libpcap and WinPcap, but, then again, people also referred to Sun Microsystems as "SUN" (it did come from the Stanford University Network project, but it wasn't "Stanford University Network Microsystems", it was just "Sun Microsystems").

Difference between Socket recv, sysread and Posix::read in sockets?

I find at least 3 ways to read from a nonblocking socket in perl
$socket->recv
$socket->sysread
POSIX::read($socket,...
looks like 3 different names to the same thing, I read the documentations but I can't find one huge differente. anyone?
sysread is stream (TCP) oriented (it doesn't care about where one send ends and another begins), and recv is datagram (UDP) oriented (it does care).
POSIX::read works on file descriptors, whereas sysread works on file handles.
The best source for documentation on recv() is man recvfrom - it is basically a perl interface to that system call. Note that recv() is usually used on sockets which are set up non-connection oriented (i.e. a UDP socket), but it may be also be used on connection oriented (i.e. TCP) sockets.
The man differences between read(), sysread() and POSIX::read() are:
read(...) takes a file handle and the IO is buffered
sysread(...) takes a file handle and the IO is not buffered
POSIX::read(...) takes a file descriptor and the IO is not buffered
A file descriptor is a value (a small integer) that is returned by POSIX::open().
Also, you can get the file descriptor of a perl file handle via the fileno() function.

Atomic write on an unix socket?

I'm trying to choose between pipes and unix sockets for an IPC mechanism.
Both support the select() and epoll() functions which is great.
Now, pipes have a 4kB (as of today) "atomic" write, which is guaranteed by the Linux Kernel.
Does such a feature exists in the case of unix sockets? I couldn't find any document stating this explicitely.
Say I use a UNIX socket and I write x bytes of data from my client. Am I sure that these x bytes will be written on the server end of the socket when my server's select() cracks?
On the same subject, would using SOCK_DGRAM ensure that writes are atomic (if such a guarantee is possible), since datagrams are supposed to be single well-defined messages?
What would then be the difference using SOCK_STREAM as a transfer mode?
Thanks in advance.
Pipes
Yes the non-blocking capacity is usually 4KB, but for maximum portability you'd probably be better off using the PIPE_BUF constant. An alternative is to use non-blocking I/O.
More information than you want to know in man 7 pipe.
Unix datagram sockets
Writes using the send family of functions on datagram sockets are indeed guaranteed to be atomic. In the case of Linux, they're reliable as well, and preserve ordering. (which makes the recent introduction of SOCK_SEQPACKET a bit confusing to me) Much information about this in man 7 unix.
The maximum datagram size is socket-dependent. It's accessed using getsockopt/setsockopt on SO_SNDBUF. On Linux systems, it ranges between 2048 and wmem_max, with a default of wmem_default. For example on my system, wmem_default = wmem_max = 112640. (you can read them from /proc/sys/net/core) Most relevant documentation about this is in man 7 socket around the SO_SNDBUF option. I recommend you read it yourself, as the capacity doubling behavior it describes can be a bit confusing at first.
Practical differences between stream and datagram
Stream sockets only work connected. This mostly means they can only communicate with one peer at a time. As streams, they're not guaranteed to preserve "message boundaries".
Datagram sockets are disconnected. They can (theoretically) communicate with multiple peers at a time. They preserve message boundaries.
[I suppose the new SOCK_SEQPACKET is in between: connected and boundary-preserving.]
On Linux, both are reliable and preserve message ordering. If you use them to transmit stream data, they tend to perform similarly. So just use the one that matches your flow, and let the kernel handle buffering for you.
Crude benchmark comparing stream, datagram, and pipes:
# unix stream 0:05.67
socat UNIX-LISTEN:u OPEN:/dev/null &
until [[ -S u ]]; do :;done
time socat OPEN:large-file UNIX-CONNECT:u
# unix datagram 0:05.12
socat UNIX-RECV:u OPEN:/dev/null &
until [[ -S u ]]; do :;done
time socat OPEN:large-file UNIX-SENDTO:u
# pipe 0:05.44
socat PIPE:p,rdonly=1 OPEN:/dev/null &
until [[ -p p ]]; do :;done
time socat OPEN:large-file PIPE:p
Nothing statistically significant here. My bottleneck is likely reading large-file.
Say I use a UNIX socket and I write x
bytes of data from my client. Am I
sure that these x bytes will be
written on the server end of the
socket when my server's select()
cracks?
If you are using AF_UNIX SOCK_STREAM socket, there is no such guarantee, that is, data written in one write/send() may require more than one read/recv() call on the receiving side.
On the same subject, would using
SOCK_DGRAM ensure that writes are
atomic (if such a guarantee is
possible), since datagrams are
supposed to be single well-defined
messages?
On there other hand, AF_UNIX SOCK_DGRAM sockets are required to preserve the datagram boundaries and be reliable. You should get EMSGSIZE error if send() can not transmit the datagram atomically. Not sure what happens for write() as the man page does not say that it can report EMSGSIZE (although man pages sometimes do not list all errors returned). I would try overflowing the receiver's buffer with big sized datagrams to see which errors exactly send/write() report.
One advantage of using UNIX sockets over pipes is the bigger buffer size. I don't remember exactly what is the limit of pipe's kernel buffer, but I remember not having enough of it and not being able to increase it (it is a hardcoded kernel constant). fast_producer_process | slow_consumer_process was orders of magnitude slower than fast_producer_process > file && file > slow_consumer_process due to insufficient pipe buffer size.

Named pipe similar to "mkfifo" creation, but bidirectional

I'd like to create a named pipe, like the one created by "mkfifo", but one caveat. I want the pipe to be bidirectional. That is, I want process A to write to the fifo, and process B to read from it, and vice-versa. A pipe created by "mkfifo" allows process A to read the data its written to the pipe. Normally I'd use two pipes, but I am trying to simulate an actual device so I'd like the semantics of open(), read(), write(), etc to be as similar to the actual device as possible. Anyone know of a technique to accomplish this without resorting to two pipes or a named socket?
Or pty ("pseudo-terminal interface"). man pty.
Use a Unix-domain socket.
Oh, you said you don't want to use the only available solution - a Unix-domain socket.
In that case, you are stuck with opening two named pipes, or doing without. Or write your own device driver for them, of course - you could do it for the open source systems, anyway; it might be harder for the closed source systems (Windows, AIX, HP-UX).

Why does writing to an unconnected socket send SIGPIPE first?

There are so many possible errors in the POSIX environment. Why do some of them (like writing to an unconnected socket in particular) get special treatment in the form of signals?
This is by design, so that simple programs producing text (e.g. find, grep, cat) used in a pipeline would die when their consumer dies. That is, if you're running a chain like find | grep | sed | head, head will exit as soon as it reads enough lines. That will kill sed with SIGPIPE, which will kill grep with SIGPIPE, which will kill find with SEGPIPE. If there were no SIGPIPE, naively written programs would continue running and producing content that nobody needs.
If you don't want to get SIGPIPE in your program, just ignore it with a call to signal(). After that, syscalls like write() that hit a broken pipe will return with errno=EPIPE instead.
See this SO answer for a detailed explanation of why writing a closed descriptor / socket generates SIGPIPE.
Why is writing a closed TCP socket worse than reading one?
SIGPIPE isn't specific to sockets — as the name would suggest, it is also sent when you try to write to a pipe (anonymous or named) as well. I guess the reason for having separate error-handling behaviour is that broken pipes shouldn't always be treated as an error (whereas, for example, trying to write to a file that doesn't exist should always be treated as an error).
Consider the program less. This program reads input from stdin (unless a filename is specified) and only shows part of it at a time. If the user scrolls down, it will try to read more input from stdin, and display that. Since it doesn't read all the input at once, the pipe will be broken if the user quits (e.g. by pressing q) before the input has all been read. This isn't really a problem, though, so the program that's writing down the pipe should handle it gracefully.
it's up to the design.
at the beginning people use signal to control events notification which were sent to the user space, and later it is not necessary because there're more popular skeletons such as polling which don't require a system caller to make a signal handler.