Receiving unknown strings lengths? - sockets

So I'm converting a Python program I wrote to Erlang, and it's been a long time since I used Erlang. So I guest I'm moved back to beginner level. Anyways from experience every language I use when dealing with sockets have send/recv functions that always return the length of data sent/receive. In Erlangs gen_tcp case however doesn't seem to do that.
So when I call send/recv/or inet:setopts it knows when the packet has ended? Will I need to write a looping recvAll/sendAll function so I can find the escape or \n in the packet(string) I wish to receive?
http://erlang.org/doc/man/gen_tcp.html#recv-2
Example code I'm using:
server(LS) ->
case gen_tcp:accept(LS) of
{ok,S} ->
loop(S),
server(LS);
Other ->
io:format("accept returned ~w - goodbye!~n",[Other]),
ok
end.
loop(S) ->
inet:setopts(S,[{active,once}]),
receive
{tcp,S,Data} ->
Answer = process(Data), % Not implemented in this example
gen_tcp:send(S,Answer),
loop(S);
{tcp_closed,S} ->
io:format("Socket ~w closed [~w]~n",[S,self()]),
ok
end.
Just from looking at examples and documentation it seems like Erlang just knows. And I want to confirm, because the length of data being received can be anywhere between to 20 bytes to 9216 bytes and or could be sent in chunks since the client is a PHP socket library I'm writing.
Thank you,
Ajm.

TL;DR
So when I call send/recv/or inet:setopts it knows when the packet has
ended?
No, it doesn't.
Will I need to write a looping recvAll/sendAll function so I can find
the escape or \n in the packet(string) I wish to receive?
Yes, generally, you will. But erlang can do this job for you.
HOW?
Actually, you couldn't rely on TCP in sense of splitting messages into packets. In general, TCP will split your stream to arbitrary sized chunks, and you program have to assemble this chunks and parse this stream by own. So, first, your protocol must be "self delimiting". For example you can:
In binary protocol - precede each packet with its length (fixed-size field). So, protocol frame will looks like this: <<PacketLength:2/big-unsigned-integer, Packet/binary>>
In text protocol - terminate each line with line feed symbol.
Erlang can help you with this deal. Take a look here http://erlang.org/doc/man/gen_tcp.html#type-option. There is important option:
{packet, PacketType}(TCP/IP sockets)
Defines the type of packets to use for a socket. The following values are valid:
raw | 0
No packaging is done.
1 | 2 | 4
Packets consist of a header specifying the number of bytes in the packet, followed by that number of bytes. The length of header can be one, two, or four bytes; containing an unsigned integer in big-endian byte order. Each send operation will generate the header, and the header will be stripped off on each receive operation.
In current implementation the 4-byte header is limited to 2Gb.
line
Line mode, a packet is a line terminated with newline, lines longer than the receive buffer are truncated.
Last option (line) is most interesting for you. If you'll set this option, erlang will parse input stream internally and yeld packets splitted by lines.

Related

Download multiple files from FTP server (using sockets)

I want to download multiple files from an FTP server using the sockets directly. I'm using Swift 5 with the BlueSocket library, which is basically a wrapper, so the commands are the same as if I did everything through e.g. Windows console.
FTP commands:
Login + connect cmdSocket
cmdSocket send: PASV
cmdSocket receive: 227 Entering Passive Mode
cmdSocket send: TYPE I
cmdSocket receive: 200 Type set to I
Connect dataSocket to Passive Mode IP/port
cmdSocket send: CWD myFolder
cmdSocket receive: 250 CWD command successful
Looping through all the files:
cmdSocket send: RETR myFileX
cmdSocket receive: Either "150 Downloading in BINARY file" or "125 Data connection already open; Transfer starting"
dataSocket: Receive data and save it to storage
cmdSocket receive: 226 Transfer complete
This works fine for the first file ("myFile1") but everything changes in the second loop iteration ("myFile2"):
cmdSocket send: RETR myFile2
cmdSocket receive: 150 Opening BINARY mode data connection.
Now the dataSocket won't return any bytes and sometimes it also receives "425 Cannot open data connection." in addition to "150". I tried to use "myFile1" twice but with the same result.
I'm guessing that the order is off but what is wrong exactly? Do I have to change the type for every file, so within the loop? Do I have to open a new data socket for every file or maybe send some "reset" command after "226" is received for the first file?
By default, FTP uses the STREAM transmission mode on data transfers. Under STREAM mode, End-Of-File is signaled by closing the data connection. As such, you can send only 1 file per data connection in STREAM mode.
To work around that, you will have to either:
issue a new PORT/PASV command to establish a new data connection for each individual file.
issue a MODE command to switch to BLOCK or COMPRESSED transmission mode before transferring files. Both modes do not signal EOF by closing the data connection, but rather by sending an explicit marker over the data connection at the end of each file, thus allowing multiple files to be transferred over a single data connection.
For more details, read the official FTP protocol specification, RFC 959, specifically sections 3.3 "DATA CONNECTION MANAGEMENT" and 3.4 "TRANSMISSION MODES":
3.3. DATA CONNECTION MANAGEMENT
Default Data Connection Ports: All FTP implementations must
support use of the default data connection ports, and only the
User-PI may initiate the use of non-default ports.
Negotiating Non-Default Data Ports: The User-PI may specify a
non-default user side data port with the PORT command. The
User-PI may request the server side to identify a non-default
server side data port with the PASV command. Since a connection
is defined by the pair of addresses, either of these actions is
enough to get a different data connection, still it is permitted
to do both commands to use new ports on both ends of the data
connection.
Reuse of the Data Connection: When using the stream mode of data
transfer the end of the file must be indicated by closing the
connection. This causes a problem if multiple files are to be
transfered in the session, due to need for TCP to hold the
connection record for a time out period to guarantee the reliable
communication. Thus the connection can not be reopened at once.
There are two solutions to this problem. The first is to
negotiate a non-default port. The second is to use another
transfer mode.
A comment on transfer modes. The stream transfer mode is
inherently unreliable, since one can not determine if the
connection closed prematurely or not. The other transfer modes
(Block, Compressed) do not close the connection to indicate the
end of file. They have enough FTP encoding that the data
connection can be parsed to determine the end of the file.
Thus using these modes one can leave the data connection open
for multiple file transfers.
3.4. TRANSMISSION MODES
The next consideration in transferring data is choosing the
appropriate transmission mode. There are three modes: one which
formats the data and allows for restart procedures; one which also
compresses the data for efficient transfer; and one which passes
the data with little or no processing. In this last case the mode
interacts with the structure attribute to determine the type of
processing. In the compressed mode, the representation type
determines the filler byte.
All data transfers must be completed with an end-of-file (EOF)
which may be explicitly stated or implied by the closing of the
data connection. For files with record structure, all the
end-of-record markers (EOR) are explicit, including the final one.
For files transmitted in page structure a "last-page" page type is
used.
NOTE: In the rest of this section, byte means "transfer byte"
except where explicitly stated otherwise.
For the purpose of standardized transfer, the sending host will
translate its internal end of line or end of record denotation
into the representation prescribed by the transfer mode and file
structure, and the receiving host will perform the inverse
translation to its internal denotation. An IBM Mainframe record
count field may not be recognized at another host, so the
end-of-record information may be transferred as a two byte control
code in Stream mode or as a flagged bit in a Block or Compressed
mode descriptor. End-of-line in an ASCII or EBCDIC file with no
record structure should be indicated by <CRLF> or <NL>,
respectively. Since these transformations imply extra work for
some systems, identical systems transferring non-record structured
text files might wish to use a binary representation and stream
mode for the transfer.
The following transmission modes are defined in FTP:
3.4.1. STREAM MODE
The data is transmitted as a stream of bytes. There is no
restriction on the representation type used; record structures
are allowed.
In a record structured file EOR and EOF will each be indicated
by a two-byte control code. The first byte of the control code
will be all ones, the escape character. The second byte will
have the low order bit on and zeros elsewhere for EOR and the
second low order bit on for EOF; that is, the byte will have
value 1 for EOR and value 2 for EOF. EOR and EOF may be
indicated together on the last byte transmitted by turning both
low order bits on (i.e., the value 3). If a byte of all ones
was intended to be sent as data, it should be repeated in the
second byte of the control code.
If the structure is a file structure, the EOF is indicated by
the sending host closing the data connection and all bytes are
data bytes.
3.4.2. BLOCK MODE
The file is transmitted as a series of data blocks preceded by
one or more header bytes. The header bytes contain a count
field, and descriptor code. The count field indicates the
total length of the data block in bytes, thus marking the
beginning of the next data block (there are no filler bits).
The descriptor code defines: last block in the file (EOF) last
block in the record (EOR), restart marker (see the Section on
Error Recovery and Restart) or suspect data (i.e., the data
being transferred is suspected of errors and is not reliable).
This last code is NOT intended for error control within FTP.
It is motivated by the desire of sites exchanging certain types
of data (e.g., seismic or weather data) to send and receive all
the data despite local errors (such as "magnetic tape read
errors"), but to indicate in the transmission that certain
portions are suspect). Record structures are allowed in this
mode, and any representation type may be used.
The header consists of the three bytes. Of the 24 bits of
header information, the 16 low order bits shall represent byte
count, and the 8 high order bits shall represent descriptor
codes as shown below.
Block Header
+----------------+----------------+----------------+
| Descriptor | Byte Count |
| 8 bits | 16 bits |
+----------------+----------------+----------------+
The descriptor codes are indicated by bit flags in the
descriptor byte. Four codes have been assigned, where each
code number is the decimal value of the corresponding bit in
the byte.
Code Meaning
128 End of data block is EOR
64 End of data block is EOF
32 Suspected errors in data block
16 Data block is a restart marker
With this encoding, more than one descriptor coded condition
may exist for a particular block. As many bits as necessary
may be flagged.
The restart marker is embedded in the data stream as an
integral number of 8-bit bytes representing printable
characters in the language being used over the control
connection (e.g., default--NVT-ASCII). <SP> (Space, in the
appropriate language) must not be used WITHIN a restart marker.
For example, to transmit a six-character marker, the following
would be sent:
+--------+--------+--------+
|Descrptr| Byte count |
|code= 16| = 6 |
+--------+--------+--------+
+--------+--------+--------+
| Marker | Marker | Marker |
| 8 bits | 8 bits | 8 bits |
+--------+--------+--------+
+--------+--------+--------+
| Marker | Marker | Marker |
| 8 bits | 8 bits | 8 bits |
+--------+--------+--------+
3.4.3. COMPRESSED MODE
There are three kinds of information to be sent: regular data,
sent in a byte string; compressed data, consisting of
replications or filler; and control information, sent in a
two-byte escape sequence. If n>0 bytes (up to 127) of regular
data are sent, these n bytes are preceded by a byte with the
left-most bit set to 0 and the right-most 7 bits containing the
number n.
Byte string:
1 7 8 8
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
|0| n | | d(1) | ... | d(n) |
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
^ ^
|---n bytes---|
of data
String of n data bytes d(1),..., d(n)
Count n must be positive.
To compress a string of n replications of the data byte d, the
following 2 bytes are sent:
Replicated Byte:
2 6 8
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
|1 0| n | | d |
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
A string of n filler bytes can be compressed into a single
byte, where the filler byte varies with the representation
type. If the type is ASCII or EBCDIC the filler byte is <SP>
(Space, ASCII code 32, EBCDIC code 64). If the type is Image
or Local byte the filler is a zero byte.
Filler String:
2 6
+-+-+-+-+-+-+-+-+
|1 1| n |
+-+-+-+-+-+-+-+-+
The escape sequence is a double byte, the first of which is the
escape byte (all zeros) and the second of which contains
descriptor codes as defined in Block mode. The descriptor
codes have the same meaning as in Block mode and apply to the
succeeding string of bytes.
Compressed mode is useful for obtaining increased bandwidth on
very large network transmissions at a little extra CPU cost.
It can be most effectively used to reduce the size of printer
files such as those generated by RJE hosts.
FTP works like the following:
You setup a TCP connection to the FTP server.
Over that socket you can send commands, and the server will reply over that same socket.
You can use the port command :
Client: PORT 192,168,1,2,7,139 //The client wants the server to send to port number 1931 on the client machine. 7 and 139 in hex = 078B = 1931
Server: 200 PORT command successful.
Client: RETR Yoyodyne.TXT //Download "Yoyodyne.TXT."
Server: 150 Opening ASCII mode data connection for Yoyodyne.TXT.The server now connects out from its port 20 on 172.16.62.36 to port 1931 on 192.168.1.2.
Server: 226 Transfer completed. //That succeeded, so the data is now sent over the established data connection. And the connection is closed
So a new RETR first needs a new PORT command. Or if not, it might be that the previous port command is persistent. But you will have to connect again.
Each file has its own socket where its data will be send/received. and the socket is closed after each file is sent. So you need a new socket each time, and a new connect.
more details here :
https://www.ncftp.com/libncftp/doc/ftp_overview.html

Converting from TClientSocket to Indy: What is equivalent to ReceiveLength?

I am using XE7 and converting an app from using the ScktComp components to Indy, using TIdTCPClient in place of TClientSocket. At present I am just putting in likely equivalents to get it to compile. Most of it has been converted, except for this snippet:
if (Socket.ReceiveLength > 0) then
begin
s := Socket.ReceiveText;
which i have converted to
s := Socket.IOHandler.ReadLn
I have no equivalent for ReceiveLength.
Any ideas?
IOHandler.ReadLn() is not the correct equivalent of ReceiveText().
ReceiveLength() returns the number of unread bytes that are currently in the socket's buffer. ReceiveText() simply reads whatever raw bytes are currently in the socket's buffer and returns them in a string variable. It is a wrapper for a single call to ReceiveBuf() using ReceiveLength() as the buffer size.
IOHandler.ReadLn() reads from the IOHandler's own memory buffer, populating it with bytes from the socket's buffer as needed, until it encounters the specified terminator (which is a LF character by default), no matter how many reads it takes to accomplish that.
There is no direct translation of ReceiveLength() in Indy, however the closest equivalent to your snippet would be to call IOHandler.CheckForDataOnSource() followed by IOHandler.InputBufferAsString(), eg:
Socket.CheckForDataOnSource(0);
if not Socket.InputBufferIsEmpty then
begin
s := Socket.InputBufferAsString;
Alternatively, you can IOHandler.ReadBytes() with its AByteCount parameter set to -1, and then convert the returned byte array to a string:
buf := nil;
Socket.ReadBytes(buf, -1);
if buf <> nil then
begin
s := BytesToString(buf);
That being said, I have to ask why you are using ReceiveText() in the first place. It returns a string of arbitrary bytes, so it doesn't really lend itself well to most communication needs, can break up textual data in unpredictable ways, and is not suited for binary data at all. Network protocols usually have structure to them, and TClientSocket usage typically requires code to manually buffer bytes and parse structured data from that buffer - things Indy is designed to handle for you. You should focus more on the goal you want to achieve and less on the particulars of how to get it. If you need to read an integer, ask Indy to read an integer. If you need to read a string of a particular length or ending with a particular delimiter, ask Indy to read a string. If you need to read a block of X bytes, ask Indy to read X bytes. Indy has many read/write methods available to automate common tasks that you would normally have to do manually.

tcprewrite - truncated packet error

i use tcprewrite to rewrite my raw pcap files (change MAC address and then change the IP) from CAIDA dataset. I came across this problem.
command use as follows:
sudo tcprewrite --infile=xxx.pcap --dlt=enet --outfile=yyy.pcap --enet-dmac=00:00:00:03 --enet-smac=00:00:00:1f
the error
pcap was captured using a snaplen of 65000 bytes. This may mean you have truncated packets.
I tried to search for the solutions from web and unfortunately i cannot solve it. According to this thread
http://sourceforge.net/p/tcpreplay/mailman/tcpreplay-users/?viewmonth=201201
the error rise because of the packet is not captured since the beginning.
Does anyone has an idea how to solve this problem?
pcap was captured using a snaplen of 65000 bytes. This may mean you have truncated packets.
Or it may not and, in fact, it probably doesn't mean you have truncated packets.
The packet capture mechanism on some systems requires that some maximum packet length be specified, but it can be something sufficiently large that, in practice, no packets will be larger than the maximum and thus no packets will be truncated.
65535 was often used as such a maximum - Wireshark's done so even before it was renamed Wireshark, and tcpdump was first changed so that "-s 0" would use 65535 and then changed to default to 65535. tcpreplay is treating any maximum packet length < 65535 as "not trying to capture the entire packet", but warning about, for example, 65534 is a bit silly.
Tcpdump and Wireshark recently boosted the limit to 262144 to handle some USB captures.
(The limit is chosen not to be too big - the pcap and pcap-ng file formats allow up to 2^32-1, but some software might, when reading such a file, try to allocate a 2^32-1-byte buffer and fail.)
So, don't worry about it, but also don't explicitly specify a snapshot length of 65000 - explicit snapshot lengths are useful only if you want to capture only part of the packet or if you have an old tcpdump (or ANCIENT Ethereal) that defaults to 68 or 96 bytes, and, in the latter case, you might as well go with a value >= 65535.

Socket, read until \n. What if bytes in message happen to be \n and end reading prematurely?

I plan to have a socket reading data until it gets to the \n character (In order to read individual messages from a data stream). What happens, however, if the message you're sending happens to have bytes that match the \n character? Won't that end reading prematurely and mess up everything? How do people usually read until a certain part in their data?
Ok, Joe provided a lot of good alternatives to reading till the "\n" character. (Now that I think of it, \n is probably only used for text based things)
Quote Joe: There's a lot of choices here. 1) substitute something for the delimiter when it's part of the content. 2) pick a less likely delimiter than \n. 3) include a message length ahead of each message that you parse out. And
#3 seems like the best way to accomplish what I'm trying to do.

How to tell the TCP server that the particular message has ended?

TCP client sends data byte by byte. So, how to tell the server that this message has ended and the new message begins now?
One way is to fix a special character that'll be sent as a bookmark, but that character can also be a part of the message causing confusions.
Any other optimum way out?
If the message is binary, delimited encoding using a special character is not possible. Tag Length Value (TLV) encoding will be best suited for this.
for example
+--------+----------+----------------+
| Tag | Length | Content |
| 0x0001 | 0x000C | "HELLO, WORLD" |
+--------+----------+----------------+
in addition to that, you can have more than one message type
One possible way can be that before sending the actual message you can send the number of bytes in the particular message. When the receiving side has received that number of bytes it can start receiving next message
Checkout the implementation used in networkComms.net, the open source communication framework. In particular IncomingPacketHandleHandOff() on line 892 here.
It guarantees that the first byte received specifies the size of a packet header (Less than 255 bytes). Once enough bytes have been received in order to rebuild the header, the header can be inspected to determine remaining size to be received (data section). If you have more incoming bytes than the expected header and data sections you look at the very first byte and start over.
Using bookmarked characters is what is used at the base level of the network stack but must be implemented carefully to avoid further complications.
If you wish to use a character as both the end of message marker or as a part of the message, you need to use an escape sequence.
For example: Use the character '$' to end the message, and '%' to escape
i.e.
%$ -> $
%% -> %
then use '$' to end the message
All alternatively send the number of bytes to be received at the start of the messssage (or message chunk if you do not know the lenght of the complete message at that point).