Socket read with pcap - sockets

I have a socket bound to a NIC that I am using to capture packets in a pcap_loop.
I have a separate process running that eventually does a "read" on that same device, but only after a unix local pipe is ready to be read. Is it correct to say that the read() on the device from the 2nd process will read everything that's ready, not just one packet at a time, even though my other process is set up to use pcap_loop to read a packet at a time?

I have a socket bound to a NIC that I am using to capture packets in a pcap_loop.
You say "socket", so I'm guessing that this is Linux (it could also be IRIX, but that's a lot less likely, and the answer is the same in either case; other OSes don't use sockets in libpcap, the native capture mechanism on those OSes uses mechanisms other than sockets).
I have a separate process running that eventually does a "read" on that same device, but only after a unix local pipe is ready to be read. Is it correct to say that the read() on the device from the 2nd process will read everything that's ready, not just one packet at a time,
No. A PF_PACKET socket returns one packet at a time from a read().
There is, by the way, no guarantee that reading from the socket with a read and handling the same socket in libpcap at the same time will work. Libpcap might be using the memory-mapped mechanism to get the packets; unless you've seen documentation on how the memory-mapped mechanism works with read()s done elsewhere, or have read the Linux kernel code enough to figure out how it works, you might not want to assume it'll work the way you want.
If, however, this is FreeBSD, as suggested (but not stated) by the tag, then what libpcap is using is a BPF device, *NOT* a socket. A read() will give you an entire bufferful of packets, and the read()s done by libpcap will give libpcap an entire bufferful of packets, even if it happens to call your callback once per packet. The same issues of read() vs. memory-mapped access could occur, but the memory-mapped BPF in later versions of FreeBSD isn't, by default, used by libpcap.

Related

What is the difference among those methods to check when NIC receive packet?

I try to benchmark the power of SolarFlare NIC, especially when using Onload. To do this, I search the method to check the time when packet arrives. And I find several methods which can be performed on UNIX environment.
After receive packet through the socket, use ioctl with such socket and option SIGOCGSTAMP. Getting the time when last packet arrives through the socket.
Using setsockopt, set the option SO_TIMESTAMPNS with the socket. By calling recvmsg, get the cmsg and check the timestamp written in cmsg.
Same as 2, but use the option SO_TIMESTAMPING with flag SOF_TIMESTAMPING_RX_SOFTWARE and SOF_TIMESTAMPING_SOFTWARE.
Same as 3, but use flag SOF_TIMESTAMPING_RX_SOFTWARE and SOF_TIMESTAMPING_RAW_HARDWARE.
But I cannot figure out what is the difference between 4 methods and what's really going on with such option. I guess, 1/2/3 uses kernel clock and 4 uses NIC's own clock. But I'm not sure...
Can you precisely explain the difference of above option and possibly other method to check the time of packet receiving?

When is a file descriptor not considered available for writing? [duplicate]

When, exactly, does the BSD socket send() function return to the caller?
In non-blocking mode, it should return immediately, correct?
As for blocking mode, the man page says:
When the message does not fit into the send buffer of the socket, send() normally blocks, unless the socket has been placed in non-blocking I/O mode.
Questions:
Does this mean that the send() call will always return immediately if there is room in the kernel send buffer?
Is the behavior and performance of the send() call identical for TCP and UDP? If not, why not?
Does this mean that the send() call will always return immediately if there is room in the kernel send buffer?
Yes. As long as immediately means after the memory you provided it has been copied to the kernel's buffer. Which, in some edge cases, may not be so immediate. For instance if the pointer you pass in triggers a page fault that needs to pull the buffer in from either a memory mapped file or the swap, that would add significant delay to the call returning.
Is the behavior and performance of the send() call identical for TCP and UDP? If not, why not?
Not quite. Possible performance differences depends on the OS' implementation of the TCP/IP stack. In theory the UDP socket could be slightly cheaper, since the OS needs to do fewer things with it.
EDIT: On the other hand, since you can send much more data per system call with TCP, typically the cost per byte can be a lot lower with TCP. This can be mitigated with sendmmsg() in recent linux kernels.
As for the behavior, it's nearly identical.
For blocking sockets, both TCP and UDP will block until there's space in the kernel buffer. The distinction however is that the UDP socket will wait until your entire buffer can be stored in the kernel buffer, whereas the TCP socket may decide to only copy a single byte into the kernel buffer (typically it's more than one byte though).
If you try to send packets that are larger than 64kiB, a UDP socket will likely consistently fail with EMSGSIZE. This is because UDP, being a datagram socket, guarantees to send your entire buffer as a single IP packet (or train of IP packet fragments) or not send it at all.
Non blocking sockets behave identical to the blocking versions with the single exception that instead of blocking (in case there's not enough space in the kernel buffer), the calls fail with EAGAIN (or EWOULDBLOCK). When this happens, it's time to put the socket back into epoll/kqueue/select (or whatever you're using) to wait for it to become writable again.
As usual when working on POSIX, keep in mind that your call may fail with EINTR (if the call was interrupted by a signal). In this case you most likely want to call send() again.
If there is room in the kernel buffer, then send() copies as many bytes as it can into the buffer and exits immediately, returning how many bytes were actually copied (which can be fewer than how many you requested). If there is no room in the kernel buffer, then send() blocks until either room becomes available or a timeout occurs (if one is configured).
The send() will return as soon as the data has been accepted by the kernel.
In case of blocking socket: The send() will block if the kernel buffer is not free enough to intake the data provided to send() call.
Non blocking sockets: send() will not block, but would fail and returns -1 or it may return number of bytes copied partially(depending on the buffer space available). It sets the errno EWOULDBLOCK or EAGAIN. This means at that time of send(), the buffer was not able to intake all the bytes and you should try again with select() call to send() the data again. Or you could put a loop with a sleep() and call send(), but you have to take care of number of bytes actually sent and the remaining number of bytes that are to be sent.
Does this mean that the send() call
will always return immediately if
there is room in the kernel send
buffer?
Shouldn't it? The moment after which the data "is sent" can be defined differently. I think this is a moment when OS accepted your data for delivery on stack. Otherwise it's quite diffucult to define it. Is it a moment, when data is transmitted to network card buffer? Or after the moment when data is pushed out of network card buffer?
Is there any problem you need to know this for sure or you are just curious?
Your presumption is correct. If there is room in the kernel send buffer, the kernel will copy the data into the send buffer and send() will return.

What's the read logic when I call recvfrom() function in C/C++

I wrote a C++ program to create a socket and bind on this socket to receive ICMP/UDP packets. The code I wrote as following:
while(true){
recvfrom(sockId, rePack, sizeof(rePack), 0, (struct sockaddr *)&raddr, (socklen_t *)&len);
processPakcet(recv_size);
}
So, I used a endless while loop to receive messages continually, But I worried about the following two questions:
1, How long the message would be kept in the receiver queue or say in NIC queue?
I worried about that if it takes too long to process the first message, then I might miss the second message. so how fast should I read after read.
2, How to prevent reading the duplicated messages?
i.e, does the receiver queue knows me, when my thread read the first message done, would the queue automatically give me the second one? or say, when I read the first message, then the first message would be deleted by the queue and no one could receive it again.
Additionally, I think the while(true) module is not good, anyone could give me a good suggestion please. (I heard something like polling module).
First, you should always check the return value from recvfrom. It's unlikely the recvfrom will fail, but if it does (for example, if you later implement signal handling, it might fail with EINTR) you will be processing undefined data. Also, of course, the return value tells you the size of the packet you received.
For question 1, the actual answer is operating system-dependent. However, most operating systems will buffer some number of packets for you. The OS interrupt handler that handles the incoming packet will never be copying it directly into your application level buffer, so it will always go into an OS buffer first. The OS has previously noted your interest in it (by virtue of creating the socket and binding it you expressed interest), so it will then place a pointer to the buffer onto a queue associated with your socket.
A different part of the OS code will then (after the interrupt handler has completed) copy the data from the OS buffer into your application memory, free the OS buffer, and return to your program from the recvfrom system call. If additional packets come in, either before or after you have started processing the first one, they'll be placed on the queue too.
That queue is not infinite of course. It's likely that you can configure how many packets (or how much buffer space) can be reserved, either at a system-wide level (think sysctl-type settings in linux), or at the individual socket level (setsockopt / ioctl).
If, when you call recvfrom, there are already queued packets on the socket, the system call handler will not block your process, instead it will simply copy from the OS buffer of the next queued packet into your buffer, release the OS buffer, and return immediately. As long as you can process incoming packets roughly as fast as they arrive or faster, you should not lose any. (However, note that if another system is generating packets at a very high rate, it's likely that the OS memory reserved will be exhausted at some point, after which the OS will simply discard packets that exceed its resource reservation.)
For question 2, you will receive no duplicate messages (unless something upstream of your machine is actually duplicating them). Once a queued message is copied into your buffer, it's released before returning to you. That message is gone forever.
(Note that it's possible that some other process has also created a socket expressing interest in the same packets. That process would also get a copy of the packet data, which is typically handled internal to the operating system by reference counting rather than by actually duplicating the OS buffers, although that detail is invisible to applications. In any case, once all interested processes have received the packet, it will be discarded.)
There's really nothing at all wrong with a while (true) loop; it's a very common control structure for long-running server-type programs. If your program has nothing else it needs to be doing in the meantime, while true allowing it to block in recvfrom is the simplest and hence clearest way to implement it.
(You could use a select(2) or poll(2) call to wait. This allows you to handle waiting for any one of multiple file descriptors at the same time, or to periodically "time out" and go do something else, say, but again if you have nothing else you might need to be doing in the meantime, that is introducing needless complication.)

how do sockets not missing arriving data?

a typical socket program example would be like this:
while(1){
data = socket.recv()
//do some work
}
since you don't know when package arrive,it must block to wait until get some data from the listening port,suppose if the program start a heavy work after received the command from another side,during the work , another package arrived,but because at that moment you are doing the work,you are not listening the port, you might missed the package ,no matter how fast you handle the work.
so how does the socket work to handle all the data without any lost?
The operating system has a receive buffer which holds packets that have been received from the network but not yet recv()ed by the application. If that buffer fills up packets will be lost. You don't have to be in a recv() call when packets arrive, though you should make sure you call it often enough to keep the buffer from overflowing.

Socket Read In Multi-Threaded Application Returns Zero Bytes or EINTR (104)

Am a c-coder for a while now - neither a newbie nor an expert. Now, I have a certain daemoned application in C on a PPC Linux. I use PHP's socket_connect as a client to connect to this service locally. The server uses epoll for multiplexing connections via a Unix socket. A user submitted string is parsed for certain characters/words using strstr() and if found, spawns 4 joinable threads to different websites simultaneously. I use socket, connect, write and read, to interact with the said webservers via TCP on their port 80 in each thread. All connections and writes seems successful. Reads to the webserver sockets fail however, with either (A) all 3 threads seem to hang, and only one thread returns -1 and errno is set to 104. The responding thread takes like 10 minutes - an eternity long:-(. *I read somewhere that the 104 (is EINTR?), which in the network context suggests that ...'the connection was reset by peer'; or (B) 0 bytes from 3 threads, and only 1 of the 4 threads actually returns some data. Isn't the socket read/write thread-safe? I use thread-safe (and reentrant) libc functions such as strtok_r, gethostbyname_r, etc.
*I doubt that the said webhosts are actually resetting the connection, because when I run a single-threaded standalone (everything else equal) all things works perfectly right, but of course in series not parallel.
There's a second problem too (oops), I can't write back to the client who connect to my epoll-ed Unix socket. My daemon application will hang and hog CPU > 100% for ever. Yet nothing is written to the clients end. Am sure the client (a very typical PHP socket application) hasn't closed the connection whenever this is happening - no error(s) detected either. Any ideas?
I cannot figure-out whatever is wrong even with Valgrind, GDB or much logging. Kindly help where you can.
Yes, read/write are thread-safe. But beware of gethostbyname() and getservbyname() if you're using them - they return pointers to static data, and may not be thread-safe.
errno 104 is ECONNREFUSED (not EINTR). Use strerror or perror to get the textual error message (like 'Connection reset by peer') for a particular errno code.
The best way to figure out what's going wrong is often to do very detailed logging - log the results of every operation, plus details like the IP address/port connecting to, the number of bytes read/written, the thread id, and so forth. And, of course, make sure your logging code is thread-safe :-)
Getting an ECONNRESET after 10 minutes sounds like the result of your connection timing out. Either the web server isn't sending the data or your app isn't receiving it.
To test the former, hookup a program like Wireshark to the local loopback device and look for traffic to and from the port you are using.
For the later, take a look at the epoll() man page. They mention a scenario where using edge triggered events could result in a lockup, because there is still data in the buffer, but no new data comes in so no new event is triggered.