Erlang gen_tcp connect question - sockets

Simple question...
This code ..
client() ->
SomeHostInNet = "localhost" % to make it runnable on one machine
{ok, Sock} = gen_tcp:connect(SomeHostInNet, 5678,
[binary, {packet, 0}]),
ok = gen_tcp:send(Sock, "Some Data"),
ok = gen_tcp:close(Sock).
is very clear except that I don't quite understand what [binary, {packet,0}] means ?
Any one cares to explain ?
MadSeb

As per the gen_tcp:connect documentation:
[binary, {packet, 0}] is the list of options that's passed to the connect function.
binary means that the data that is sent/received on the socket is in binary format (as opposed to, say, list format.
{packet, 0}, is a little confusing and it doesn't appear to be covered in the documentation. After talking to some knowledgeable chaps in #erlang on Freenode, I found that the packet option specifies how many bytes indicate the packet length. Behind the scenes, the length is stripped from the packet and erlang just sends you the packet without the length. Therefore {packet, 0} is the same as a raw packet without a length and everything is handled but the receiver of the data. For more information on this, check out inet:setopts.
Hope that helps.

{packet,0} is used to indicate that TCP data is delivered directly to the application in an unmodified form.
binary means that received packet is delivered as a binary. (but you can still use gen_tcp:send with a message like "message")

Related

What is meant by record or data boundaries in the sense of TCP & UDP protocol?

I am learning to sockets and found the word Data OR Record Boundaries in SOCK_SEQPACKET communication protocol? Can anyone explain in simple words what is Data boundary and how the SOCK_SEQPACKET is different from SOCK_STREAM & SOCK_DGRAM ?
This answer https://stackoverflow.com/a/9563694/1076479 has a good succinct explanation of message boundaries (a different name for "record boundaries").
Extending that answer to SOCK_SEQPACKET:
SOCK_STREAM provides reliable, sequenced communication of streams of data between two peers. It does not maintain message (record) boundaries, which means the application must manage its own boundaries on top of the stream provided.
SOCK_DGRAM provides unreliable transmission of datagrams. Datagrams are self-contained capsules and their boundaries are maintained. That means if you send a 20 byte buffer on peer A, peer B will receive a 20 byte message. However, they can be dropped, or received out of order, and it's up to the application to figure that out and handle it.
SOCK_SEQPACKET is a newer technology that is not yet widely used, but tries to marry the benefits of both of the above. That is, it provides reliable, sequenced communication that also transmits entire "datagrams" as a unit (and hence maintains message boundaries).
It's easiest to demonstrate the concept of message boundaries by showing what happens when they're neglected. Beginners often post client code like this here on SO (using python for convenience):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('192.168.4.122', 9000))
s.send(b'FOO') # Send string 1
s.send(b'BAR') # Send string 2
reply = s.recv(128) # Receive reply
And server code similar to this:
lsock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
lsock.bind(('', 9000))
lsock.listen(5)
csock, caddr = lsock.accept()
string1 = csock.recv(128) # Receive first string
string2 = csock.recv(128) # Receive second string <== XXXXXXX
csock.send(b'Got your messages') # Send reply
They don't understand then why the server hangs on the second recv call, while the client is hung on its own recv call. That happens because both strings the client sent (may) get bundled together and received as a single unit in the first recv on the server side. That is, the message boundary between the two logical messages was not preserved, and so string1 will often contain both chunks run together: 'FOOBAR'
(Often there are other timing-related aspects to the code that influence when/whether that actually happens or not.)

Parsing ByteString from Socket fails

We are writing a message broker in Haskell (HMB). Therefore messages have to be parsed (Data.Binary) after they are received from socket (Network.Socket). We've been testing on loopback (localhost) so far - for producing and parsing messages. This worked quiet well. If we benchmark by producing messages from another machine we are facing problems: Suddenly the parser does not have enough bytes to parse.
The first 4 bytes of each message defines the length of the message and thus describes the message to be parsed. As hinted above, we do parsing with Data.Binary - so this is lazy. For testing purposes we switched parsing of the first 4 bytes to strict by using the cereal library. This the same problem. We now even tried to completely parse the requests with cereal only and the problem also remains.
In the code you'll see that we do threading. However, we also tried without a channel (single threaded setup) but this didn't solve the problem either.
Here is a part of the code (Thread1) where the received bytes are written to a channel to be further consumed/parsed. (As mentioned, nothing changes if we omit channeling and directly parse input):
runConnection :: (Socket, SockAddr) -> RequestChan -> Bool -> IO()
runConnection conn chan False = return ()
runConnection conn chan True = do
r <- recvFromSock conn
case (r) of
Left e -> do
handleSocketError conn e
runConnection conn chan False
Right input -> do
threadDelay 5000 -- THIS FIXES THE PROBLEM!?
writeToReqChan conn chan input
runConnection conn chan True
Here is the part (Thread2) where input is beeing parsed:
runApiHandler :: RequestChan -> ResponseChan -> IO()
runApiHandler rqChan rsChan = do
(conn, req) <- readChan rqChan
case readRequest req of -- readRequest IS THE PARSER
Left (bs, bo, e) -> handleHandlerError conn $ ParseRequestError e
Right (bs, bo, rm) -> do
res <- handleRequest rm
case res of
Left e -> handleHandlerError conn e
Right bs -> writeToResChan conn rsChan bs
runApiHandler rqChan rsChan
Now I figured out, that if the process of parsing is delayed a bit (see threadDelay in the first code block), everything works fine. Which basically means, the parser doesn't wait for bytes received from the socket.
Why is that? Why does the parser not wait for the socket the have enough bytes? Is there a general mistake in our setup?
I would bet that the problem has nothing to do with the parser but is instead due to the blocking semantics of UNIX sockets.
While a loopback
interface will likely pass the packet directly from the sender to the receiver,
an Ethernet interface may need to break up the packet to fit in the Maximum
Transmission Unit (MTU) of the link. This is known as packet fragmentation.
The len
argument to the recv system call is merely
the upper bound on the received length (e.g. the size of the target buffer); the
call may produce less data than you ask for. To quote the manpage,
If no messages are available at the socket, the receive calls wait for a
message to arrive, unless the socket is nonblocking (see fcntl(2)), in which
case the value -1 is returned and the external variable errno is set to
EAGAIN or EWOULDBLOCK. The receive calls normally return any data
available, up to the requested amount, rather than waiting for receipt of
the full amount requested.
For this reason, you may need multiple recv calls to retrieve the entire packet. Your example works if you delay the recv as the operating system can reassemble the original packet since all fragments have arrived by the time it is requested.
As meiersi pointed out, there are a variety of streaming I/O libraries that have developed in the Haskell world for solving this problem, among others. These include pipes, conduit, io-streams, and others. Depending upon your goals, this may be a natural way to handle this issue.
You might want to try the socket support in conduit-extra combined with binary-conduit to properly handle the parsing of the chunked streaming, which happens due to the reasons pointed out by bgamari.
First of all, consider yourself lucky to observe this. On many platforms perhaps only one out of a thousand packets exhibit this behaviour, causing a lot of such (sorry) bad networking code to fail seldom and randomly.
The problem is that you start processing before the data is ready. Instead of the threadDelay (which introduces a permanent delay and might not be long enough in all cases), the solution is to make sure you have at least one item/message/packet to process before you start processing it. Your protocol where the first 32bit word contains the length is perfect for this. Read data until you have at least 4 bytes (the length). Then read data until you have the required number of bytes. If any calls to recvFromSock returns less than the required number, call it again to get some more. Remember to also handle the case of 0 bytes, this means the other party closed the connection.
I have implemented this for a similar protocol (SMPP, packets also starts with the length) and it works perfectly.

Examine data at in callout driver for FWPM_LAYER_EGRESS_VSWITCH_TRANSPORT_V4 layer in WFP

I am writing the callout driver for Hyper-V 2012 where I need to filter the packets sent from virtual machines.
I added filter at FWPM_LAYER_EGRESS_VSWITCH_TRANSPORT_V4 layer in WFP. Callout function receive packet buffer which I am typecasting it to NET_BUFFER_LIST. I am doing following to get the data pointer
pNetBuffer = NET_BUFFER_LIST_FIRST_NB((NET_BUFFER_LIST*)pClassifyData->pPacket);
pContiguousData = NdisGetDataBuffer(pNetBuffer, NET_BUFFER_DATA_LENGTH(pNetBuffer), 0, 1, 0);
I have simple client-server application to test the packet data. Client is on VM and server is another machine. As I observed, data sent from client to server is truncated and some garbage value is added at the end. There is no issue for sending message from server to client. If I dont add this layer filter client-server works without any issue.
Callback function receives the metadata which incldues ipHeaderSize and transportHeaderSize. Both these values are zero. Are these correct values or should those be non-zero??
Can somebody help me to extract the data from packet in callout function and forward it safely to further layers?
Thank You.
These are the TCP packets. I looked into size and offset information. It seems the problem is consistent across packets.
I checked below values in (NET_BUFFER_LIST*)pClassifyData->pPacket.
NET_BUFFER_LIST->NetBUfferListHeader->NetBUfferListData->FirstNetBuffer->NetBuffe rHeader->NetBufferData->CurrentMdl->MappedSystemVa
First 24 bytes are only sent correctly and remaining are garbage.
For example total size of the packet is 0x36 + 0x18 = 0x4E I don't know what is there in first 0x36 bytes which is constant for all the packets. Is it a TCP/IP header? Second part 0x18 is the actual data which i sent.
I even tried with API NdisQueryMdl() to retrieve from MDL list.
So on the receiver side I get only 24 bytes correct and remaining is the garbage. How to read the full buffer from NET_BUFFER_LIST?

erlang sockets and gen_server - no data received on server side

In a nutshell:
I am trying to make a socket server to which clients connect and send/receive messages (based on the sockserv code in Learn you some erlang tutorial http://learnyousomeerlang.com/buckets-of-sockets)
Server side components:
supervisor - unique, started at the very beginning, spawns processes with gen_server behaviour
gen_server behaviour processes - each one deals with a connection.
Client side:
client which connects to the socket and sends a few bytes of data and then disconnects.
Code details
My code is pretty much the same as in the presented tutorial. The supervisor is identical. The gen_server component is simplified so that it has only one handle_info case which is supposed to catch everything and just print it.
Problem
The connection succeeds, but when the client sends data, the server behaves as though no data is received (I am expecting that handle_info is called when that happens).
handle_info does get called but only when the client disconnects and this event is reported with a message.
My attempts
I have played around with different clients written in Erlang or Java, I have tried setting the active/passive state of the socket. The author of the tutorial sets {active, once} after sending a message. I ended up just setting {active, true} after the AcceptSocket is created as such: (the gen_server proc is initialized with a state which contains the original ListenSocket created by the supervisor)
handle_cast(accept, S = #state{socket=ListenSocket}) ->
{ok, AcceptSocket} = gen_tcp:accept(ListenSocket),
io:format("accepted connection ~n", []),
sockserv_sup:start_socket(), % a new acceptor is born, praise the lord
inet:setopts(AcceptSocket, [{active, true}]),
send(AcceptSocket, "Yellow", []),
{noreply, S#state{socket=AcceptSocket, next=name}}.
send(Socket, Str, Args) ->
ok = gen_tcp:send(Socket, io_lib:format(Str++"~n", Args)),
ok.
handle_info(E, S) ->
io:format("mothereffing unexpected: ~p~n", [E]),
{noreply, S}.
It has aboslutely no effect. handle_info only gets called when the connection is lost because the client disconnects. whenever the client sends data nothing happens.
What could be the problem? I have spend quite some time on this, I really have no idea.
Many thanks.
Have you tried setting the other options in http://www.erlang.org/doc/man/inet.html#setopts-2
inet:setopts(AcceptSocket, [{active, true}])
for example:
{packet, line} to read in a line at a time
and
binary to read in data as a binary.
I also was working through a similar exercise based on that tutorial recently and my options used were:
inet:setopts(LSocket, [{active,true}, {packet, line}, binary, {reuseaddr, true}]),
To conclude, watch out for the options. I was indeed not paying attention to the implications of the set of options. I tried with a more narrowed down situation and worked it out. My problem was the {packet, line} option which implies that \n is considered a message delimiter.

limitation of the reception buffer

I established a connection with a client this way:
gen_tcp:listen(1234,[binary,{packet,0},{reuseaddr,true},{active,false},{recbuf,2048}]).
This code performs message processing:
loop(Socket)->
inet:setops(Socket,[{active,once}],
receive
{tcp,Socket,Data}->
handle(Data),
loop(Socket);
{Pid,Cmd}->
gen_tcp:send(Socket,Cmd),
loop(Socket);
{tcp_close,Socket}->
% ...
end.
My OS is Windows. When the size of the message is 1024 bytes, I lose bytes in Data. The server sends ACK + FIN to the client.
I believe that the Erlang is limited to 1024 bytes, therefore I defined recbuf.
Where the problem is: Erlang, Windows, hardware?
Thanks.
You may be setting the receive buffer far too small. Erlang certainly isn't limited to a 1024 byte buffer. You can check for yourself by doing the following in the shell:
{ok, S} = gen_tcp:connect("www.google.com", 80, [{active,false}]),
O = inet:getopts(S, [recbuf]),
gen_tcp:close(S),
O.
On Mac OS X I get a default receive buffer size of about 512Kb.
With {packet, 0} parsing, you'll receive tcp data in whatever chunks the network stack chooses to send it in, so you have to do message boundary parsing and buffering yourself. Do you have a reliable way to check message boundaries in the wire protocol? If so, receive the tcp data and append it to a buffer variable until you have a complete message. Then call handle on the complete message and remove the complete message from the buffer before continuing.
We could probably help you more if you gave us some information on the client and the protocol in use.