FTP: How to direct the client to not modify the bytes it received? - sockets

I am trying to implement an FTP server in C. However, as I send bytes to the client, the client automatically modifies the bytes in a certain way.
For example, I notice that the client changes 0D0A to 0A (CR+LF to LF). There are some other mysterious changes as well.
Is there a way that from the server side I can direct the client to not change any bytes it received? Or do I have to modify the bytes that I send in order to adjust to the client's conventions?

The FTP program found on unix systems has an "ascii" mode (the default) and a "binary" mode which must be used for binary transfers.
Possibly the issue you are having is due to the use of ascii mode.
Try using the "binary" command to switch modes.

Related

Download multiple files from FTP server (using sockets)

I want to download multiple files from an FTP server using the sockets directly. I'm using Swift 5 with the BlueSocket library, which is basically a wrapper, so the commands are the same as if I did everything through e.g. Windows console.
FTP commands:
Login + connect cmdSocket
cmdSocket send: PASV
cmdSocket receive: 227 Entering Passive Mode
cmdSocket send: TYPE I
cmdSocket receive: 200 Type set to I
Connect dataSocket to Passive Mode IP/port
cmdSocket send: CWD myFolder
cmdSocket receive: 250 CWD command successful
Looping through all the files:
cmdSocket send: RETR myFileX
cmdSocket receive: Either "150 Downloading in BINARY file" or "125 Data connection already open; Transfer starting"
dataSocket: Receive data and save it to storage
cmdSocket receive: 226 Transfer complete
This works fine for the first file ("myFile1") but everything changes in the second loop iteration ("myFile2"):
cmdSocket send: RETR myFile2
cmdSocket receive: 150 Opening BINARY mode data connection.
Now the dataSocket won't return any bytes and sometimes it also receives "425 Cannot open data connection." in addition to "150". I tried to use "myFile1" twice but with the same result.
I'm guessing that the order is off but what is wrong exactly? Do I have to change the type for every file, so within the loop? Do I have to open a new data socket for every file or maybe send some "reset" command after "226" is received for the first file?
By default, FTP uses the STREAM transmission mode on data transfers. Under STREAM mode, End-Of-File is signaled by closing the data connection. As such, you can send only 1 file per data connection in STREAM mode.
To work around that, you will have to either:
issue a new PORT/PASV command to establish a new data connection for each individual file.
issue a MODE command to switch to BLOCK or COMPRESSED transmission mode before transferring files. Both modes do not signal EOF by closing the data connection, but rather by sending an explicit marker over the data connection at the end of each file, thus allowing multiple files to be transferred over a single data connection.
For more details, read the official FTP protocol specification, RFC 959, specifically sections 3.3 "DATA CONNECTION MANAGEMENT" and 3.4 "TRANSMISSION MODES":
3.3. DATA CONNECTION MANAGEMENT
Default Data Connection Ports: All FTP implementations must
support use of the default data connection ports, and only the
User-PI may initiate the use of non-default ports.
Negotiating Non-Default Data Ports: The User-PI may specify a
non-default user side data port with the PORT command. The
User-PI may request the server side to identify a non-default
server side data port with the PASV command. Since a connection
is defined by the pair of addresses, either of these actions is
enough to get a different data connection, still it is permitted
to do both commands to use new ports on both ends of the data
connection.
Reuse of the Data Connection: When using the stream mode of data
transfer the end of the file must be indicated by closing the
connection. This causes a problem if multiple files are to be
transfered in the session, due to need for TCP to hold the
connection record for a time out period to guarantee the reliable
communication. Thus the connection can not be reopened at once.
There are two solutions to this problem. The first is to
negotiate a non-default port. The second is to use another
transfer mode.
A comment on transfer modes. The stream transfer mode is
inherently unreliable, since one can not determine if the
connection closed prematurely or not. The other transfer modes
(Block, Compressed) do not close the connection to indicate the
end of file. They have enough FTP encoding that the data
connection can be parsed to determine the end of the file.
Thus using these modes one can leave the data connection open
for multiple file transfers.
3.4. TRANSMISSION MODES
The next consideration in transferring data is choosing the
appropriate transmission mode. There are three modes: one which
formats the data and allows for restart procedures; one which also
compresses the data for efficient transfer; and one which passes
the data with little or no processing. In this last case the mode
interacts with the structure attribute to determine the type of
processing. In the compressed mode, the representation type
determines the filler byte.
All data transfers must be completed with an end-of-file (EOF)
which may be explicitly stated or implied by the closing of the
data connection. For files with record structure, all the
end-of-record markers (EOR) are explicit, including the final one.
For files transmitted in page structure a "last-page" page type is
used.
NOTE: In the rest of this section, byte means "transfer byte"
except where explicitly stated otherwise.
For the purpose of standardized transfer, the sending host will
translate its internal end of line or end of record denotation
into the representation prescribed by the transfer mode and file
structure, and the receiving host will perform the inverse
translation to its internal denotation. An IBM Mainframe record
count field may not be recognized at another host, so the
end-of-record information may be transferred as a two byte control
code in Stream mode or as a flagged bit in a Block or Compressed
mode descriptor. End-of-line in an ASCII or EBCDIC file with no
record structure should be indicated by <CRLF> or <NL>,
respectively. Since these transformations imply extra work for
some systems, identical systems transferring non-record structured
text files might wish to use a binary representation and stream
mode for the transfer.
The following transmission modes are defined in FTP:
3.4.1. STREAM MODE
The data is transmitted as a stream of bytes. There is no
restriction on the representation type used; record structures
are allowed.
In a record structured file EOR and EOF will each be indicated
by a two-byte control code. The first byte of the control code
will be all ones, the escape character. The second byte will
have the low order bit on and zeros elsewhere for EOR and the
second low order bit on for EOF; that is, the byte will have
value 1 for EOR and value 2 for EOF. EOR and EOF may be
indicated together on the last byte transmitted by turning both
low order bits on (i.e., the value 3). If a byte of all ones
was intended to be sent as data, it should be repeated in the
second byte of the control code.
If the structure is a file structure, the EOF is indicated by
the sending host closing the data connection and all bytes are
data bytes.
3.4.2. BLOCK MODE
The file is transmitted as a series of data blocks preceded by
one or more header bytes. The header bytes contain a count
field, and descriptor code. The count field indicates the
total length of the data block in bytes, thus marking the
beginning of the next data block (there are no filler bits).
The descriptor code defines: last block in the file (EOF) last
block in the record (EOR), restart marker (see the Section on
Error Recovery and Restart) or suspect data (i.e., the data
being transferred is suspected of errors and is not reliable).
This last code is NOT intended for error control within FTP.
It is motivated by the desire of sites exchanging certain types
of data (e.g., seismic or weather data) to send and receive all
the data despite local errors (such as "magnetic tape read
errors"), but to indicate in the transmission that certain
portions are suspect). Record structures are allowed in this
mode, and any representation type may be used.
The header consists of the three bytes. Of the 24 bits of
header information, the 16 low order bits shall represent byte
count, and the 8 high order bits shall represent descriptor
codes as shown below.
Block Header
+----------------+----------------+----------------+
| Descriptor | Byte Count |
| 8 bits | 16 bits |
+----------------+----------------+----------------+
The descriptor codes are indicated by bit flags in the
descriptor byte. Four codes have been assigned, where each
code number is the decimal value of the corresponding bit in
the byte.
Code Meaning
128 End of data block is EOR
64 End of data block is EOF
32 Suspected errors in data block
16 Data block is a restart marker
With this encoding, more than one descriptor coded condition
may exist for a particular block. As many bits as necessary
may be flagged.
The restart marker is embedded in the data stream as an
integral number of 8-bit bytes representing printable
characters in the language being used over the control
connection (e.g., default--NVT-ASCII). <SP> (Space, in the
appropriate language) must not be used WITHIN a restart marker.
For example, to transmit a six-character marker, the following
would be sent:
+--------+--------+--------+
|Descrptr| Byte count |
|code= 16| = 6 |
+--------+--------+--------+
+--------+--------+--------+
| Marker | Marker | Marker |
| 8 bits | 8 bits | 8 bits |
+--------+--------+--------+
+--------+--------+--------+
| Marker | Marker | Marker |
| 8 bits | 8 bits | 8 bits |
+--------+--------+--------+
3.4.3. COMPRESSED MODE
There are three kinds of information to be sent: regular data,
sent in a byte string; compressed data, consisting of
replications or filler; and control information, sent in a
two-byte escape sequence. If n>0 bytes (up to 127) of regular
data are sent, these n bytes are preceded by a byte with the
left-most bit set to 0 and the right-most 7 bits containing the
number n.
Byte string:
1 7 8 8
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
|0| n | | d(1) | ... | d(n) |
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
^ ^
|---n bytes---|
of data
String of n data bytes d(1),..., d(n)
Count n must be positive.
To compress a string of n replications of the data byte d, the
following 2 bytes are sent:
Replicated Byte:
2 6 8
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
|1 0| n | | d |
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
A string of n filler bytes can be compressed into a single
byte, where the filler byte varies with the representation
type. If the type is ASCII or EBCDIC the filler byte is <SP>
(Space, ASCII code 32, EBCDIC code 64). If the type is Image
or Local byte the filler is a zero byte.
Filler String:
2 6
+-+-+-+-+-+-+-+-+
|1 1| n |
+-+-+-+-+-+-+-+-+
The escape sequence is a double byte, the first of which is the
escape byte (all zeros) and the second of which contains
descriptor codes as defined in Block mode. The descriptor
codes have the same meaning as in Block mode and apply to the
succeeding string of bytes.
Compressed mode is useful for obtaining increased bandwidth on
very large network transmissions at a little extra CPU cost.
It can be most effectively used to reduce the size of printer
files such as those generated by RJE hosts.
FTP works like the following:
You setup a TCP connection to the FTP server.
Over that socket you can send commands, and the server will reply over that same socket.
You can use the port command :
Client: PORT 192,168,1,2,7,139 //The client wants the server to send to port number 1931 on the client machine. 7 and 139 in hex = 078B = 1931
Server: 200 PORT command successful.
Client: RETR Yoyodyne.TXT //Download "Yoyodyne.TXT."
Server: 150 Opening ASCII mode data connection for Yoyodyne.TXT.The server now connects out from its port 20 on 172.16.62.36 to port 1931 on 192.168.1.2.
Server: 226 Transfer completed. //That succeeded, so the data is now sent over the established data connection. And the connection is closed
So a new RETR first needs a new PORT command. Or if not, it might be that the previous port command is persistent. But you will have to connect again.
Each file has its own socket where its data will be send/received. and the socket is closed after each file is sent. So you need a new socket each time, and a new connect.
more details here :
https://www.ncftp.com/libncftp/doc/ftp_overview.html

For .pcap file, whether a swapped magic_number 0xd4c3b2a1 effect the network packet?

For example, when the magic_number is 0xa1b2c3d4, the ether_type is 0x0008.
Then, if the magic_number is 0xd4c3b2a1, will the ether_type be 0x0800?
I only have .pcap files that the magic_number is 0xa1b2c3d4, so I cann't verify myself.
Or someone may upload a .pcap file whose magic_number is 0xd4c3b2a1, then I can analysis myself.
Thanks.
According to tcpdump manpage :
That allows software reading the file to determine
whether the byte order of the host that wrote the file is the same as
the byte order of the host on which the file is being read, and thus
whether the values in the per-file and per-packet headers need to be
byte-swapped.
I think the answer is yes.
The Ethernet type field is always in big-endian ("network") byte order in a packet, so, to read it, you must convert it from big-endian byte order to your machine's byte order, using, for example, ntohs().
This is independent of the byte order of the host that wrote the file; using ntohs() will work on all machines with all pcap files.
(Bear in mind, however, that there are machines on which you would have to fetch the Ethernet type field - or any other multi-byte integral field - a byte at a time and assemble those bytes into the value, because those machines trap on unaligned memory accesses. See, for example, the EXTRACT_ macros/functions in extract.h in the tcpdump source.)

How to send text in UTF-8 using Indy TIdTCPServer in c++ builder

My client j2me application reading text input stream using UTF-8
reader = new InputStreamReader(in,"UTF-8");
and my server when gets connected sends text using this statement
AContext->Connection->IOHandler->WriteLn(cxMemo1->Text,TEncoding::UTF8);
but result text showing weird characters like ?????????????????????????? ?????????????
Where I'm doing wrong?
also when i tried to load from utf-8 encoding data file in such a way
AContext->Connection->IOHandler->WriteFile("c:\\fids.xml");
it's all the same!
Indy 10 completely supports UTF-8 encoding. I've myself worked with it's TIdFTP component & successfully uploaded Unicode text files. From what I can make of it:
Your connection/transfer type is set to ftASCII rather than ftBinary.
Your J2ME applet/Host platform does not suport UTF-8
'?' characters occur when data is going through a Unicode-to-Ansi conversion to an Ansi charset that does not support the Unicode characters being converted.
What version of C++Builder are you using? In versions prior to CB2009, you should tell Indy the encoding of the AnsiString data that you are passing in. Indy defaults to ASCII (ie: TIdTextEncoding::ASCII) for most String-based operation. That can be overridden when needed, either with optional AAnsiEncoding parameters, the TIdIOHandler::DefAnsiEncoding property, or the global Idglobal::GIdDefaultAnsiEncoding setting. If you do not specify the correct encoding, the AnsiString data may not be converted to Unicode correctly before then being converted to UTF-8. For example:
AContext->Connection->IOHandler->WriteLn(cxMemo1->Text, TIdTextEncoding_UTF8, TTIdTextEncoding_Default);
Or:
AContext->Connection->IOHandler->DefAnsiEncoding = TIdTextEncoding_Default;
AContext->Connection->IOHandler->WriteLn(cxMemo1->Text, TIdTextEncoding_UTF8);
You can optionally also use the TIdIOHandler::DefStringEncoding property if you do not want to specify the UTF-8 encoding on every call:
AContext->Connection->IOHandler->DefStringEncoding = TIdTextEncoding_UTF8;
AContext->Connection->IOHandler->WriteLn(cxMemo1->Text);
Now, with that said, the fact that WriteFile() is also sending data that J2ME is not handling correctly tells me that Indy is not the root of the issue. WriteFile() simply dups the raw file data as-is to the connection without any interpretation at all. If you send a UTF-8 encoded file, then UTF-8 encoded octets will be sent to J2ME.
I suggest you use a packet sniffer, such as Wireshark, to verify the data that Indy is sending. That will tell you for sure whether Indy is really at fault or not.
*PS: notice in the examples above that I use Indy's TIdTextEncoding macros instead of TEncoding directly. This is because Indy's TIdTextEncoding logic works around some bugs in Embarcadero's TEncoding classes. Also, we're going to phase out direct support for TEncoding in Indy 11 and expand on TIdTextEncoding so Indy has more control than Embarcadero offers.

What means Zend_Mime::ENCODING_8BIT when sending mails with Zend_Mail?

In the example for Zend_Mail on http://framework.zend.com/manual/en/zend.mail.attachments.html they use ENCODING_8BIT but searching for what that might be sends me to http://msdn.microsoft.com/en-us/library/ms526992%28EXCHG.10%29.aspx were (and this sounds logical to me) it is explained that 8bit encoding does not make sense for emails.
Edit:
When I use this encoding for a mail with an attachment, I receive the mail with a corrupted attachment in my mail software (Thunderbird)
In which cases does it make sense to use ENCODING_8BIT?
As everybody said, ENCODING_8BIT represents the Content Transfer Encoding.
Basically, 8BITMIME is used for Internationalization. It's using a 8-bit character sets and therefore, allow you to send any character supported in the UTF8 charset.
In general, non-MIME mailers send 8-bit data but do not include any
MIME headers to mark the message as 8-bit data. MIME mailers should
cope with this without any problems. [source]
So basically there is not really a case where it makes sense to use ENCODING_8BIT over another encoding since emails in UTF8 are a standard today. Also, note that most of the MTAs (Message Transfer Agent, such as Postfix, etc.) automatically force the encoding to 8BITMIME (UTF-8).
Here is a good resource about the 8BITMIME encoding.
The 8BITMIME extension has two effects in practice:
The client will avoid Q-P conversion.
The client may add extra
information at the end of a MAIL request: a space followed by either
"BODY=7BIT" or "BODY=8BITMIME".
Zend_Mime::ENCODING_8BIT sets the Content-Transfer-Encoding.
The Content-Transfer-Encoding defines methods for representing binary data in ASCII text format.
The use of Zend_Mime::ENCODING_8BIT in the example is a Bug.
For sending Attachments you should always use Zend_Mime::ENCODING_BASE64
Not for email but for attachements. If you take a look on the RFC 2045 at page 7:
RFC2045
"Binary data" refers to data where any
sequence of octets whatsoever is
allowed.

RS-232C and Email in 7bit char set

The book "Designing Embedded Hardware" in the chapter "9.3. Old Faithful: RS-232C" mentions that emails are still sent in 7bit char set because of RS-232C:
It's also not unheard of to see
RS-232C systems still using 7-bit data
frames (another leftover from the
'60s), rather than the more common
8-bit. In fact, this is one of the reasons why you'll
still see email being sent on the
Internet limited to a 7-bit character
set, just in case the packets happen
to be routed via a serial connection
that supports only 7-bit
transmissions.
How can I confirm the observation?
Check out the spec. The original rfc822, for ARPA Internet Text Messages, explicitly states:
A message consists of header fields
and, optionally, a body. The body is
simply a sequence of lines containing
ASCII characters.
Since ASCII is 7-bit, voila.
Note, however, that there are a whole bunch of additions to that original spec, all the MIME extensions, which allow message header extensions for non-ascii text.
The Quoted-printable MIME encoding is specifically designed to encode 8-bit data in 7-bit characters. This encoding is widely used to encode email.
Note also that the text you quoted says "in case the packets happen to be routed via a serial connection" which is misleading, especially if they're talking in a context of IP packets. IP packets assume an 8-bit data path, and cannot be sent directly over a 7-bit RS-232 link without additional encoding (and then it's not a 7-bit data path anymore, it's 8-bit).
The systems that were restricted to 7 bits were already old when email first became popular. The chances that you will find one today approach zero.
Since certain characters have special meaning to email programs (most notably the end-of-line character), it still makes sense to limit the character set.