out of order FIN packet and overwrite? - sockets

While going through an open source code-base, I was thinking of an interesting scenario.
Lets say after the successful TCP connection establishment , A TCP client has to send a packet with sequence number = 101. Instead it sends a FIN with sequence number 201. Now that the TCP server thinks FIN as out of order and queues it and waits for a data packet to arrive.
My question is what should be the behavior of a TCP end point according to RFC, if a server receives a data packet with sequence number = 101 and length = 150. Does it overwrite the FIN sent earlier? Or server trims the data packet till the FIN sequence number? Or it dependent on the TCP implementations?

According to some paragraphs in RFC 793
"3. If the connection is in a synchronized state (ESTABLISHED,
FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT),
any unacceptable segment (out of window sequence number or
unacceptable acknowledgment number) must elicit only an empty
acknowledgment segment containing the current send-sequence number
and an acknowledgment indicating the next sequence number expected
to be received, and the connection remains in the same state."
....
"A natural way to think about processing incoming segments is to
imagine that they are first tested for proper sequence number (i.e.,
that their contents lie in the range of the expected "receive window"
in the sequence number space) and then that they are generally queued
and processed in sequence number order.
When a segment overlaps other already received segments we reconstruct
the segment to contain just the new data, and adjust the header fields
to be consistent."
...
My Resposne:
Remember that if this happen, it is because of a bad-behaving TCP at the client. Not for the out-of-order but for the wrong sequence in the segment with the FIN flag. Or maybe an attack.
When TCP, in the server side, receives the segment with SEQ=201, it will store this segment for a limited time and will send back an ACK for 101 because it is waiting for that SEQ number.
Then when the segment with SEQ=101 arrives, TCP in the receiving side, after receiving the segment with SEQ=101 will have a new receive window. From the first arrived segment with SEQ=201, it should get only the data beyond Byte 251 (in my test instead of doing this it removed the overlapping Bytes from the segment with SEQ=101 - This might be implementation dependent), if any, and accept the FIN. The receiving TCP will send back an ACK. When the socket is closed in the server side, the receiving TCP will send back a [FIN, ACK] segment.
To test it I have a client that does exactly what you described (this is done with raw sockets in user space, it's not possible to simulate it with TCP sockets. The server is a simple nodejs server), a FIN segment is sent and 15 seconds later the previous segment is sent. The server reads the data received and after 10 seconds it closes the socket.
Here is the tcpdump, you can see TCP server side responses:
[rodolk#localhost ~]$ sudo tcpdump -i p2p1 -vv tcp
tcpdump: listening on p2p1, link-type EN10MB (Ethernet), capture size 65535 bytes
19:33:03.648216 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
192.168.56.101.16345 > 192.168.56.1.webcache: Flags [S], cksum 0x5f49 (correct), seq 523645, win 500, length 0
19:33:03.649826 IP (tos 0x0, ttl 128, id 26590, offset 0, flags [DF], proto TCP (6), length 44)
192.168.56.1.webcache > 192.168.56.101.16345: Flags [S.], cksum 0x1ac8 (correct), seq 1576251572, ack 523646, win 8192, options [mss 1460], length 0
19:33:03.651208 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
192.168.56.101.16345 > 192.168.56.1.webcache: Flags [.], cksum 0x5091 (correct), seq 1, ack 1, win 500, length 0
19:33:03.651567 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 74)
192.168.56.101.16345 > 192.168.56.1.webcache: Flags [F.], cksum 0x8121 (correct), seq 122:156, ack 1, win 500, length 34
19:33:03.651891 IP (tos 0x0, ttl 128, id 26591, offset 0, flags [DF], proto TCP (6), length 40)
192.168.56.1.webcache > 192.168.56.101.16345: Flags [.], cksum 0x5314 (correct), seq 1, ack 1, win 65392, length 0
19:33:18.652083 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 171)
192.168.56.101.16345 > 192.168.56.1.webcache: Flags [P.], cksum 0xf973 (correct), seq 1:132, ack 1, win 500, length 131
19:33:18.652834 IP (tos 0x0, ttl 128, id 26593, offset 0, flags [DF], proto TCP (6), length 40)
192.168.56.1.webcache > 192.168.56.101.16345: Flags [.], cksum 0x5313 (correct), seq 1, ack 157, win 65237, length 0
19:33:28.661041 IP (tos 0x0, ttl 128, id 26594, offset 0, flags [DF], proto TCP (6), length 40)
192.168.56.1.webcache > 192.168.56.101.16345: Flags [F.], cksum 0x5312 (correct), seq 1, ack 157, win 65237, length 0
19:33:28.961756 IP (tos 0x0, ttl 128, id 26595, offset 0, flags [DF], proto TCP (6), length 40)
192.168.56.1.webcache > 192.168.56.101.16345: Flags [F.], cksum 0x5312 (correct), seq 1, ack 157, win 65237, length 0

Related

Strange behavior of socket-bind in kubernetes container

I have this simple socket client:
#include <arpa/inet.h>
#include <stdio.h>
#include <string.h>
#include <sys/socket.h>
#include <unistd.h>
int main(int argc, char const* argv[])
{
int sock = 0, valread, client_fd;
struct sockaddr_in serv_addr;
char* hello = "Hello from client";
char buffer[1024] = { 0 };
struct sockaddr_in address;
int addrlen = sizeof(address);
if ((sock = socket(AF_INET, SOCK_STREAM, 0)) < 0) {
printf("\n Socket creation error \n");
return -1;
}
const int enable = 1;
if (setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &enable, sizeof(int)) < 0)
{
printf("Address reuse setup failed\n");
return -1;
}
address.sin_family = AF_INET;
address.sin_addr.s_addr = 0x04030201;
address.sin_port = htons(2222);
if (bind(sock, (struct sockaddr*)&address,
sizeof(address))
< 0) {
perror("bind failed !! ");
return -1;
}
serv_addr.sin_family = AF_INET;
serv_addr.sin_port = htons(7777);
if (inet_pton(AF_INET, "123.123.123.123", &serv_addr.sin_addr)
<= 0) {
printf(
"\nInvalid address/ Address not supported \n");
return -1;
}
if ((client_fd
= connect(sock, (struct sockaddr*)&serv_addr,
sizeof(serv_addr)))
< 0) {
printf("\nConnection Failed \n");
return -1;
}
send(sock, hello, strlen(hello), 0);
printf("Hello message sent\n");
valread = read(sock, buffer, 1024);
printf("%s\n", buffer);
// closing the connected socket
close(client_fd);
return 0;
}
And I'm running this in kubernetes container. The network configuration in the container is as follows:
Interfaces (loopback / eth0):
# ifconfig lo:40
lo:40: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 1.2.3.4 netmask 255.0.0.0
loop txqueuelen 1000 (Local Loopback)
# ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1480
inet 111.222.11.22 netmask 255.255.255.255 broadcast 111.222.11.22
ether 12:23:34:45:67:89 txqueuelen 0 (Ethernet)
RX packets 171888 bytes 89525968 (85.3 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 135282 bytes 332839040 (317.4 MiB)
TX errors 0 dropped 1 overruns 0 carrier 0 collisions 0
Routing table:
# ip route
default via 222.111.1.1 dev eth0
222.111.1.1 dev eth0 scope link
TCP DUMP at the time of running the application:
# tcpdump -i any port 7777 or port 2222 -nn
tcpdump: data link type LINUX_SLL2
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
20:48:17.556210 lo In IP 1.2.3.4.2222 > 127.0.0.1.15001: Flags [S], seq 946414299, win 28800, options [mss 1440,sackOK,TS val 4246074623 ecr 0,nop,wscale 10], length 0
20:48:17.556225 lo In IP 123.123.123.123.7777 > 1.2.3.4.2222: Flags [S.], seq 2796262891, ack 946414300, win 43690, options [mss 65495,sackOK,TS val 2687581120 ecr 4246074623,nop,wscale 10], length 0
20:48:17.556237 lo In IP 1.2.3.4.2222 > 127.0.0.1.15001: Flags [.], ack 2796262892, win 29, options [nop,nop,TS val 4246074623 ecr 2687581120], length 0
20:48:17.556269 lo In IP 1.2.3.4.2222 > 127.0.0.1.15001: Flags [P.], seq 0:17, ack 1, win 29, options [nop,nop,TS val 4246074623 ecr 2687581120], length 17
20:48:17.556274 lo In IP 123.123.123.123.7777 > 1.2.3.4.2222: Flags [.], ack 18, win 43, options [nop,nop,TS val 2687581120 ecr 4246074623], length 0
20:48:17.556438 eth0 Out IP 111.222.11.22.34198 > 123.123.123.123.7777: Flags [S], seq 3447272825, win 28800, options [mss 1440,sackOK,TS val 2335942519 ecr 0,nop,wscale 10], length 0
20:48:18.613556 eth0 Out IP 111.222.11.22.34198 > 123.123.123.123.7777: Flags [S], seq 3447272825, win 28800, options [mss 1440,sackOK,TS val 2335943577 ecr 0,nop,wscale 10], length 0
20:48:20.661558 eth0 Out IP 111.222.11.22.34198 > 123.123.123.123.7777: Flags [S], seq 3447272825, win 28800, options [mss 1440,sackOK,TS val 2335945625 ecr 0,nop,wscale 10], length 0
20:48:24.693564 eth0 Out IP 111.222.11.22.34198 > 123.123.123.123.7777: Flags [S], seq 3447272825, win 28800, options [mss 1440,sackOK,TS val 2335949657 ecr 0,nop,wscale 10], length 0
20:48:27.556769 lo In IP 123.123.123.123.7777 > 1.2.3.4.2222: Flags [R.], seq 1, ack 18, win 43, options [nop,nop,TS val 2687591121 ecr 4246074623], length 0
What I'm expecting to see is a TCP SYN packet going out as follows:
TCP SYN packet of src IP: 1.2.3.4 dst IP: 123.123.123.123 going out to interface eth0.
But what I'm seeing is very strange, as you can see from the TCP dump:
First packet: SYN packet is from 1.2.3.4 to 127.0.0.1 (from lo:40 to lo, which is looping within loopback interfaces).
Second packet: The loopback (127.0.0.1) is replying back with src ip of "123.123.123.123", which is really out of the blue!
Sixth packet, I see TCP SYN again from src IP of 111.222.11.22, which is the IP of interface eth0.
I believe there are some weird configuration going on in the container as if I ran that code in other Linux box, it behaves as expected. Could anyone give any pointers where I can check in my container?
Few more facts:
no iptable and nf-table rules

TCP connection reset due to multiple acknoledgement from server?

I am trying to have a tcp communication established between a gps device and java asynchronous nio socket channel.
When the server reads from the channel after accepting the connection, it gives the below stack trace.
java.util.concurrent.ExecutionException: java.io.IOException: Connection reset by peer
at sun.nio.ch.PendingFuture.get(PendingFuture.java:185)
at com.socket.Teltonika.Codec.NioSocketServer$1.completed(NioSocketServer.java:50)
at com.socket.Teltonika.Codec.NioSocketServer$1.completed(NioSocketServer.java:32)
at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126)
at sun.nio.ch.Invoker$2.run(Invoker.java:218)
at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.UnixAsynchronousSocketChannelImpl.finishRead(UnixAsynchronousSocketChannelImpl.java:387)
at sun.nio.ch.UnixAsynchronousSocketChannelImpl.finish(UnixAsynchronousSocketChannelImpl.java:191)
at sun.nio.ch.UnixAsynchronousSocketChannelImpl.onEvent(UnixAsynchronousSocketChannelImpl.java:213)
at sun.nio.ch.EPollPort$EventHandlerTask.run(EPollPort.java:293)
When I look at the tcpdump , I find that there is a reset connection from the client. Under what conditions can this happen?
Also I see that the server is sending ack twice, could this be a problem.??
x.x.x.x -- Gps device ip
y.y.y.y -- My server ip
17:29:38.129012 IP (tos 0x0, ttl 115, id 43952, offset 0, flags [DF], proto TCP (6), length 64)
x.x.x.x.live.vodafone.in.43187 > y.y.y.y.4241: Flags [S], cksum 0xa168 (correct), seq 1477993782, win 10880, options [mss 1348,nop,wscale 0,nop,nop,sackOK,nop,nop,TS val 983851 ecr 0], length 0
17:29:38.129056 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
y.y.y.y.4241 > x.x.x.x.live.vodafone.in.43187: Flags [S.], cksum 0xf0e9 (incorrect -> 0x92e7), seq 1324366009, ack 1477993783, win 28960, options [mss 1460,sackOK,TS val 1043927932 ecr 983851,nop,wscale 7], length 0
17:29:39.127403 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
y.y.y.y.4241 > x.x.x.x.live.vodafone.in.43187: Flags [S.], cksum 0xf0e9 (incorrect -> 0x91ed), seq 1324366009, ack 1477993783, win 28960, options [mss 1460,sackOK,TS val 1043928182 ecr 983851,nop,wscale 7], length 0
17:29:41.127397 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
y.y.y.y.4241 > x.x.x.x.live.vodafone.in.43187: Flags [S.], cksum 0xf0e9 (incorrect -> 0x8ff9), seq 1324366009, ack 1477993783, win 28960, options [mss 1460,sackOK,TS val 1043928682 ecr 983851,nop,wscale 7], length 0
17:29:42.113169 IP (tos 0x0, ttl 115, id 43953, offset 0, flags [DF], proto TCP (6), length 52)
x.x.x.x.live.vodafone.in.43187 > y.y.y.y.4241: Flags [.], cksum 0x01e6 (correct), ack 1, win 10880, options [nop,nop,TS val 984747 ecr 1043928682], length 0
17:29:43.890057 IP (tos 0x0, ttl 115, id 0, offset 0, flags [none], proto TCP (6), length 40)
x.x.x.x.live.vodafone.in.43187 > y.y.y.y.4241: Flags [R], cksum 0xac3c (correct), seq 1477993783, win 65535, length 0
When I send the data to the server using netcap command, I do receive the data perfectly.
echo abctest-sumangala | nc y.y.y.y 4241 //I receive the bytes abctest-sumangala

Reading MLDv2 queries using an IPv6 socket

I have mrd6 installed on my raspberry pi. It registers with a local interface (tun0) and periodically transmits MLDv2 queries over it.
According to [RFC3810], MLDv2 message types are a subset of ICMPv6 messages, and are identified in IPv6 packets by a preceding Next Header value of 58 (0x3a). They are sent with a link-local IPv6 Source Address, an IPv6 Hop Limit of 1, and an IPv6 Router Alert option [RFC2711] in a Hop-by-Hop Options header.
I can confirm that I'm seeing these packets periodically over tun0:
pi#machine:~ $ sudo tcpdump -i tun0 ip6 -vv -XX
01:22:52.125915 IP6 (flowlabel 0x71df6, hlim 1, next-header Options (0)
payload length: 36)
fe80::69bf:be2d:e087:9921 > ip6-allnodes: HBH (rtalert: 0x0000) (padn)
[icmp6 sum ok] ICMP6, multicast listener query v2 [max resp delay=10000]
[gaddr :: robustness=2 qqi=125]
0x0000: 6007 1df6 0024 0001 fe80 0000 0000 0000 `....$..........
0x0010: 69bf be2d e087 9921 ff02 0000 0000 0000 i..-...!........
0x0020: 0000 0000 0000 0001 3a00 0502 0000 0100 ........:.......
0x0030: 8200 b500 2710 0000 0000 0000 0000 0000 ....'...........
0x0040: 0000 0000 0000 0000 027d 0000 .........}..
I have a socket set up in my application on tun0 as follows, since I expect these to be ICMP packets:
int fd = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6); // ICMP
// ... bind this socket to tun0
int interfaceIndex = // tun0 interface Index
int mcastTTL = 10;
int loopBack = 1;
if (setsockopt(listener->socket,
IPPROTO_IPV6,
IPV6_MULTICAST_IF,
&interfaceIndex,
sizeof(interfaceIndex))
< 0) {
perror("setsockopt:: IPV6_MULTICAST_IF:: ");
}
if (setsockopt(listener->socket,
IPPROTO_IPV6,
IPV6_MULTICAST_LOOP,
&loopBack,
sizeof(loopBack))
< 0) {
perror("setsockopt:: IPV6_MULTICAST_LOOP:: ");
}
if (setsockopt(listener->socket,
IPPROTO_IPV6,
IPV6_MULTICAST_HOPS,
&mcastTTL,
sizeof(mcastTTL))
< 0) {
perror("setsockopt:: IPV6_MULTICAST_HOPS:: ");
}
struct ipv6_mreq mreq6 = {{{{0}}}};
MEMCOPY(&mreq6.ipv6mr_multiaddr.s6_addr, sourceAddress, 16);
mreq6.ipv6mr_interface = interfaceIndex;
if (setsockopt(listener->socket,
IPPROTO_IPV6,
IPV6_JOIN_GROUP,
&mreq6,
sizeof(mreq6))
< 0) {
perror("setsockopt:: IPV6_JOIN_GROUP:: ");
}
Setting up the socket this way, I can receive ICMP echo requests, replies to my own address, and multicasts sent using the link-local multicast address. However, I don't see any MLDv2 queries.
Here's my receive loop:
uint8_t received[1000] = { 0 };
struct sockaddr_storage peerAddress = { 0 };
socklen_t addressLength = sizeof(peerAddress);
socklen_t addressLength = sizeof(peerAddress);
int receivedLength = recvfrom(sockfd,
received,
sizeof(received),
0,
(struct sockaddr *)&peerAddress,
&addressLength);
if (receivedLength > 0) {
// Never get here for MLDv2 queries.
}
Researching this a bit further, I discovered the IPV6_ROUTER_ALERT socket option, which the man page describes as follows:
IPV6_ROUTER_ALERT
Pass forwarded packets containing a router alert hop-by-hop option to this socket.
Only allowed for SOCK_RAW sockets. The tapped packets are not forwarded by the
kernel, it is the user's responsibility to send them out again. Argument is a
pointer to an integer. A positive integer indicates a router alert option value
to intercept. Packets carrying a router alert option with a value field
containing this integer will be delivered to the socket. A negative integer
disables delivery of packets with router alert options to this socket.
So I figured I was missing this option, and tried setting it as follows. [RFC2710] 0 means Multicast Listener Discovery message.
int routerAlertOption = 0;
if (setsockopt(listener->socket,
IPPROTO_IPV6,
IPV6_ROUTER_ALERT,
&routerAlertOption,
sizeof(routerAlertOption))
< 0) {
perror("setsockopt:: IPV6_ROUTER_ALERT:: ");
}
However, this gives me the ENOPROTOOPT error (errno 92). Some more Googling (http://www.atm.tut.fi/list-archive/usagi-users-2005/msg00317.html) led me to the fact that you can't set the IPV6_ROUTER_ALERT option with the IPPROTO_ICMPV6 protocol. It needs a socket defined using the IPPROTO_RAW protocol.
However, defining my socket as:
int fd = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
means I'm not able to receive any ICMP packets in my recvfrom anymore.
TL;DR: How do I read MLDv2 queries using an IPv6 socket?
edit (answer):
It appears conventional implementations of Linux will drop MLDv2 packets when passing them to an ICMPV6 socket. Why this is, I'm not sure. (Could be because of the next-header option.)
I followed the accepted answer below and went with an approach of reading raw packets on the tun0 interface. I followed the ping6_ll.c example here: http://www.pdbuchan.com/rawsock/rawsock.html.
It uses a socket with (SOCK_RAW, ETH_P_ALL). You can also set some SOL_PACKET options to filter on specific multicast rules on your interface.
From a quick look at RFCs things aren't looking good. Per RFC4443 (ICMPv6) 2.4:
2.4. Message Processing Rules
Implementations MUST observe the following rules when processing
ICMPv6 messages (from [RFC-1122]):
(b) If an ICMPv6 informational message of unknown type is received,
it MUST be silently discarded.
According to MLDv2 spec it makes use of types 130, 143, perhaps something else (not seeing more diagrams in the RFC), while valid ICMPv6 types are 1, 2, 3, 4, 101, 107, 127, 128, 129, 200, 201, 255.
It looks like the implementation (kernel) must drop MLDv2 packets if they are to be passed to an ICMPv6 socket. Personally I don't see much sense in making MLDv2 look like ICMPv6 if conventional implementations will drop the packet anyways, but I didn't see anything that contradicts this claim.
You can surely go deeper and use a raw socket, especially given that your stack doesn't recognize MLDv2 (perhaps there's a kernel patch to fix that?). But you'll have to parse IP and ICMP headers on your own then.

Split pcap on multiple IP addresses

I have a large pcap file that I am trying to split, and I have a list of IP addresses. I would like to split the pcap into two smaller pcaps. One pcap will include all the packets with src equal to one of the IP addresses in my list, and one pcap will include everything else (dest equal to one of the listed IP addresses). In other words, one pcap includes all packets flowing into those machines, and one pcap includes all packets flowing out of those machines. All packets will have either src or dest equal to one of the listed IPs. Can this be done using tcpdump? I would really prefer to use tcpdump since it will be a lot of overhead for me to install any other tools on the Linux machine I am using.
Yes you can.
First use tcpdump -w FILE in order to record the packet flow:
$ sudo tcpdump -i eth0 -s0 -n -e -w /tmp/w.pcap
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
^C123 packets captured
126 packets received by filter
0 packets dropped by kernel
Then use tcpdump -r FILE to read it.
$ tcpdump -n -r /tmp/w.pcap | head
20:48:38.498793 IP 192.168.250.10.22 > 192.168.250.1.49434: Flags [P.], seq 240912301:240912433, ack 2683174485, win 724, options [nop,nop,TS val 8711083 ecr 381715459], length 132
20:48:38.498968 IP 192.168.250.1.49434 > 192.168.250.10.22: Flags [.], ack 132, win 8183, options [nop,nop,TS val 381715490 ecr 8711083], length 0
20:48:40.945504 IP 192.168.250.10.68 > 192.168.250.254.67: BOOTP/DHCP, Request from 00:0c:29:48:aa:d6, length 300
20:48:40.946062 IP 192.168.250.254.67 > 192.168.250.10.68: BOOTP/DHCP, Reply, length 300
20:48:41.045549 IP 192.168.250.10.33131 > 109.231.72.179.22: Flags [S], seq 724706181, win 29200, options [mss 1460,sackOK,TS val 8711720 ecr 0,nop,wscale 6], length 0
20:48:42.539655 IP 192.168.250.1.49471 > 192.168.250.10.22: Flags [S], seq 3387751538, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 381719336 ecr 0,sackOK,eol], length 0
20:48:42.539703 IP 192.168.250.10.22 > 192.168.250.1.49471: Flags [S.], seq 3352023725, ack 3387751539, win 28960, options [mss 1460,sackOK,TS val 8712093 ecr 381719336,nop,wscale 6], length 0
20:48:42.539782 IP 192.168.250.1.49471 > 192.168.250.10.22: Flags [.], ack 1, win 8235, options [nop,nop,TS val 381719336 ecr 8712093], length 0
20:48:42.540066 IP 192.168.250.1.49471 > 192.168.250.10.22: Flags [P.], seq 1:22, ack 1, win 8235, options [nop,nop,TS val 381719336 ecr 8712093], length 21
20:48:42.540078 IP 192.168.250.10.22 > 192.168.250.1.49471: Flags [.], ack 22, win 453, options [nop,nop,TS val 8712093 ecr 381719336], length 0
To filter out specific hosts, just use the standard tcpdump command line, e.g.:
$ tcpdump -n -r /tmp/w.pcap host 8.8.8.8 | head
reading from file /tmp/w.pcap, link-type EN10MB (Ethernet)
20:48:47.595511 IP 192.168.250.10 > 8.8.8.8: ICMP echo request, id 10742, seq 1, length 64
20:48:47.603743 IP 8.8.8.8 > 192.168.250.10: ICMP echo reply, id 10742, seq 1, length 64
20:48:48.597758 IP 192.168.250.10 > 8.8.8.8: ICMP echo request, id 10742, seq 2, length 64
20:48:48.606064 IP 8.8.8.8 > 192.168.250.10: ICMP echo reply, id 10742, seq 2, length 64
20:48:49.600303 IP 192.168.250.10 > 8.8.8.8: ICMP echo request, id 10742, seq 3, length 64
20:48:49.610471 IP 8.8.8.8 > 192.168.250.10: ICMP echo reply, id 10742, seq 3, length 64
Or to exclude that host:
$ tcpdump -n -r /tmp/w.pcap not host 8.8.8.8 | head
reading from file /tmp/w.pcap, link-type EN10MB (Ethernet)
20:48:38.498793 IP 192.168.250.10.22 > 192.168.250.1.49434: Flags [P.], seq 240912301:240912433, ack 2683174485, win 724, options [nop,nop,TS val 8711083 ecr 381715459], length 132
20:48:38.498968 IP 192.168.250.1.49434 > 192.168.250.10.22: Flags [.], ack 132, win 8183, options [nop,nop,TS val 381715490 ecr 8711083], length 0
20:48:40.945504 IP 192.168.250.10.68 > 192.168.250.254.67: BOOTP/DHCP, Request from 00:0c:29:48:aa:d6, length 300
20:48:40.946062 IP 192.168.250.254.67 > 192.168.250.10.68: BOOTP/DHCP, Reply, length 300
20:48:41.045549 IP 192.168.250.10.33131 > 109.231.72.179.22: Flags [S], seq 724706181, win 29200, options [mss 1460,sackOK,TS val 8711720 ecr 0,nop,wscale 6], length 0
20:48:42.539655 IP 192.168.250.1.49471 > 192.168.250.10.22: Flags [S], seq 3387751538, win 65535, options [mss 1460,nop,wscale 4,nop,nop,TS val 381719336 ecr 0,sackOK,eol], length 0
20:48:42.539703 IP 192.168.250.10.22 > 192.168.250.1.49471: Flags [S.], seq 3352023725, ack 3387751539, win 28960, options [mss 1460,sackOK,TS val 8712093 ecr 381719336,nop,wscale 6], length 0
20:48:42.539782 IP 192.168.250.1.49471 > 192.168.250.10.22: Flags [.], ack 1, win 8235, options [nop,nop,TS val 381719336 ecr 8712093], length 0
20:48:42.540066 IP 192.168.250.1.49471 > 192.168.250.10.22: Flags [P.], seq 1:22, ack 1, win 8235, options [nop,nop,TS val 381719336 ecr 8712093], length 21
20:48:42.540078 IP 192.168.250.10.22 > 192.168.250.1.49471: Flags [.], ack 22, win 453, options [nop,nop,TS val 8712093 ecr 381719336], length 0
You can do lists of hosts using tcpdump -n -r /tmp/w.pcap host 8.8.8.8 or host 8.8.4.4 or similar.

packetsocket opened on loopback device receives all the packets twice. How to filter these duplicate entries?

when i open a packetsocket on a loopback interface (lo) and listen all the packets are seen twice. why is it so?
But a capture on the interface using tcpdump correctly ignores the duplicate entries. see the 'packets received by filter' (which contains the duplicate packets) and 'packets captured'. How is this filtering done
tcpdump -i lo -s 0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 65535 bytes
11:00:08.439542 IP 12.0.0.3 > localhost.localdomain: icmp 64: echo request seq 1
11:00:08.439559 IP localhost.localdomain > 12.0.0.3: icmp 64: echo reply seq 1
11:00:09.439866 IP 12.0.0.3 > localhost.localdomain: icmp 64: echo request seq 2
11:00:09.439884 IP localhost.localdomain > 12.0.0.3: icmp 64: echo reply seq 2
11:00:10.439389 IP 12.0.0.3 > localhost.localdomain: icmp 64: echo request seq 3
11:00:10.439410 IP localhost.localdomain > 12.0.0.3: icmp 64: echo reply seq 3
6 packets captured
12 packets received by filter
0 packets dropped by kernel
My code:
int main()
{
int sockFd;
if ( (sockFd=socket(PF_PACKET, SOCK_DGRAM, 0))<0 ) {
perror("socket()");
return -1;
}
/* bind the packet socket */
struct sockaddr_ll addr;
struct ifreq ifr;
strncpy (ifr.ifr_name, "lo", sizeof(ifr.ifr_name));
if(ioctl(sockFd, SIOCGIFINDEX, &ifr) == -1)
{
perror("iotcl");
return -1;
}
memset(&addr, 0, sizeof(addr));
addr.sll_family=AF_PACKET;
addr.sll_protocol=htons(ETH_P_ALL);
addr.sll_ifindex=ifr.ifr_ifindex;
if ( bind(sockFd, (struct sockaddr *)&addr, sizeof(addr)) ) {
perror("bind()");
return -1;
}
char buffer[MAX_BUFFER+1];
int tmpVal = 1;
while(tmpVal > 0)
{
tmpVal = recv (sockFd, buffer, MAX_BUFFER, 0);
cout<<"Received Pkt with Bytes "<<tmpVal <<endl;
}
}
Figured out the problem.
from libcaps code:
* - The loopback device gives every packet twice; on 2.2[.x] kernels,
* if we use PF_PACKET, we can filter out the transmitted version
* of the packet by using data in the "sockaddr_ll" returned by
* "recvfrom()", but, on 2.0[.x] kernels, we have to use
* PF_INET/SOCK_PACKET, which means "recvfrom()" supplies a
* "sockaddr_pkt" which doesn't give us enough information to let
* us do that.
the listening entity needs to filter the duplicate packet using the if_index got from recvfrom api.