Unicode and network communication - unicode

I am planning on developing a windows based client app and a platform agnostic server app. The client app basically sends messages to the server app. Client app can send messages in English or in other languages. Should I be using UNICODE for encoding messages in my client app? What is the general practise among applications involved in network communication? My client and server app will be using custom protocol for exchanging messages over TCP/IP. What UNICODE encoding does Windows and UNIX platform support by default? Should I be exchanging the encoding type in my protocol as well for decoding the UNICODE messages? Please advise.

Look for UTF-8, the encoding of unicode in 8 bit bytes efficient for English and western languages.
It is always a good idea to exchange the encoding type, in case you at a later stage want to support something else.
UTF-8 is supported by all major OS:es and computer languages.

If you control both the server and the client, I'd pick 1 encoding type and stick with it.
I would suggest either UTF-8 (most efficient for english and western languages) or UTF-16 (make sure to chose a byte order).

You can use whatever encoding you want, you just have to be careful about things like byte order. Windows internally uses UTF-16 (little-endian), so if you expect most systems to be Windows, then you should probably go with that. Otherwise, I'd recommend UTF-8, which doesn't have byte-order issues to worry about.
If you do go with UTF-16 (or UTF-32, which I definitely would not recommend), spell out in no uncertain terms what the endianness of the data on the wire is. Then, for every client which reads or writes a Unicode character to a network socket, convert from the platform's native endianness to the network endianness - this is either a no-op or a byte swap.

Related

Using messaging apps to transfer binary data

I was wondering if it's possible to transfer arbitrary binary data through messaging apps such as Telegram. I guess the question is if binary data can be transferred through text messages. I read somewhere that this is possible if Base64 binary-to-text encoding is used. Telegram is a platform which is not censored in the country I'm living in. So, if I can relqy binary data through telegram, it can be used to bypass censorship. Does telegram support Base64 encoding? What are your thoughts on this?
Well, I'm almost certain that the Base64 encoding can be used to carry binary data on telegram. However, there are limitations on the rate of messages sent on Telegram. Therefore, the idea to use Telegram as a proxy is not achievable since a significantly high number of messages needs to be sent.

Position of MIME in the Networking stack

Based on a what I found on the internet, MIME (Multipurpose Internet Mail Extensions, now Internet Media Type (?)) is a way to describe file types (a header used by several protocols).
So, MIME itself is not a protocol, rather an extension used by other protocols, right ?
This means that the extension is used at the application layer by the applications with no protocol doing anything other than carrying the MIME header.
So, if I send a mail with a mp3 attachment, SMTP/other application layer protocol recognizes that this is an mp3 attachment or it is the duty of the application solely to recognize the file? In that sense, MIME cannot be called as an extension to SMTP but rather a feature to be used by applications.
If SMTP does not recognize that this is a different kind of file, how will it properly store it at the mail server ? (e.g. a MPEG video file needs a particular format to be stored, how will mail server store it without giving it any special treatment ? )
Sorry if my questions sound a bit vague but I want to get an idea of how different protocols (especially, SMTP) use MIME.
Thanks for your help.
RFC 822 email was originally purely plain-text, 7-bit US-ASCII. MIME specifies a facility for encapsulating other media types in email containers. It does not specify any changes to SMTP (although e.g. the 8BITMIME ESMTP extension is useful for simplifying transport of MIME messages). Thus, it is an extension of an existing protocol, not a distinct protocol in its own right. This is also demonstrated by the fact that other protocols -- notably, HTTP -- have incorporated (parts of) MIME for tagging of content types and encodings.
An Internet Media Type is only one aspect of what MIME used to codify; the mechanisms for specifying character sets and encodings are still defined in MIME proper.
Traditionally, the mail server simply stores the bare RFC822 message in its message store; it is the responsibility of the mail client to parse and possibly manipulate any MIME structure in the body for display and interaction. (The fact that RFC 822 has been superseded by 2282 and then 5322 has not fundamentally changed the actual mail message format.)
Some servers deviate from this model; for example, Microsoft Exchange seems to parse all incoming messages in order to borg them into its internal format, somewhat to the detriment of its interoperability with standard tools, and the sanity of those few of us who require reliable, felicitous access to our actual email.
The SMTP protocol itself knows nothing about the MIME format, but the SMTP server itself has to at least implement basic rfc0822 support in order to ad the Received headers, however, it does not need to implement MIME.
How does the server save the file to disk? The same way it received it from the client over the TCP/IP stream. It just saves the raw bytes sent (with the addition of the addition of a Received header I mentioned).
In other words, you are way over-thinking this. The SMTP server doesn't have to know anything about mp3 file attachments or anything else because the MIME format (it's not a protocol) is just a way to serialize the mp3 data in a message.

Is it safe to send 8-bit emails?

I would like to know if it is safe to send emails with 8-bit characters or if it is still needed to use quoted-printable or base64 encoding.
The 8BITMIME extension is now 20 years old. Are there SMTP servers or mail clients that still are not 8-bit clean? Is there any impact on email deliverability when sending 8-bit emails?
I did not find any numbers but it looks like it is now quite safe to send emails with 8-bit body. But since the big players like Gmail still encode emails there might be some servers that still are not 8-bit clean.
However while sending an email with an 8-bit body might be safe, sending it with 8-bit headers is not.
RFC 2822 which was the standard until late 2008 prohibited non-ASCII characters in headers.
RFC 6532 proposed a standard for 8-bit headers but it is quite recent (2012) and does not seem widely implemented yet.
So sending unencoded 8-bit emails is currently not safe.
There are still SMTP servers that haven't been updated to support 8BITMIME, so yes, you still need to check for the extension.

confusion regarding XMPP xep-0065 and xep-0096

I am currently working on xmppframework, Requirements are to transfer the file between two iPhones. I searched for XEPs and found 0065 and 0096
XEP-0065 says:
XMPP is designed for sending relatively small chunks of XML between
network entities and is not designed for sending binary data. However,
sometimes it is desirable to send binary data to another entity that
one has discovered on the XMPP network (e.g., to send a file).
Therefore it is valuable to have a generic protocol for streaming
binary data between any two entities on an XMPP network. The main
application for such a bytestreaming technology is file transfer as
specified in SI File Transfer [1] and Jingle File Transfer [2].
However, other applications are possible, which is why it is important
to develop a generic protocol rather than one that is specialized for
a particular application such as file transfer.
Please see the line in bold, its confusing me if file transfer XEPs are SI File Transfer(0096) and Jingle File Transfer(0234), then what is the purpose of this 0065 XEP? why people on net referring sep-0065 for file transfer?
In XMPP there are different protocols (XEPS) for file transfer. Jingle, Bytestreams, OOB, IBB...
The purpose of XEP-0096 is stream initiation. So its build on top of the other file transfer protocols to enable seamless file transfers.
So its used to agree on one of the above file transfer protocols between 2 clients for a transfer, and also for finding a fallback method if this fails for any reason.
Alex
XEP-0065 is for proxied file transfers: you will need such a proxy in your infrastructure, unless you use a public one.
XEP-0096 is much more complex, I wouldn't recommend that for a start, although I would recommend it if you later extensively use large binary transfers/exchanges, as Jingle is used for VoIP at least.

Alternative to Wireshark for raw Ethernet capture over USB-Ethernet adapter

(Apologies: I uninstalled and reinstalled WinPcap and now I can see the extra interface! Suggestion found in Wireshark FAQ. I leave the original question below.)
I use WireShark to examine ethernet packet contents at the byte level (in/out of custom FPGA-based hardware). I have a USB-Ethernet adapter to add a second Ethernet port to my laptop. It was a cheap Chinese device bought on Ebay but now that I've found an appropriate driver, it works OK. However, I see that, on Windows, WinPcap/WireShark doesn't support Ethernet capture over USB.
While it would be nice if WireShark could be made to work on USB capture, I'm really looking for an alternative way to grab the raw ethernet bytes. I have some perl scripts set up that operate on the raw frames output from tshark, (Wireshark command line) and I could easily feed it from any stream of frames/bytes.
Is anyone doing something similar or is there a tidy way to output the raw bytes?
Sniffed raw USB bytes would be OK, but it would be nicer if someone has already programmed/scripted extracting the Ethernet frames. I'm using perl but any compiled app or python or C# or C++ or .. would be fine.
You mentioned python, scapy can do a LOT of raw packet things, might want to look at that. From their git:
Scapy is a powerful Python-based interactive packet manipulation
program and library.
It is able to forge or decode packets of a wide number of protocols,
send them on the wire, capture them, store or read them using pcap
files, match requests and replies, and much more. It is designed to
allow fast packet prototyping by using default values that work.
It can easily handle most classical tasks like scanning, tracerouting,
probing, unit tests, attacks or network discovery (it can replace
hping, 85% of nmap, arpspoof, arp-sk, arping, tcpdump, wireshark, p0f,
etc.). It also performs very well at a lot of other specific tasks
that most other tools can't handle, like sending invalid frames,
injecting your own 802.11 frames, combining techniques (VLAN
hopping+ARP cache poisoning, VoIP decoding on WEP protected channel,
...), etc.
Scapy supports Python 2.7 and Python 3 (3.3 to 3.6). It's intended to
be cross platform, and runs on many different platforms (Linux, OSX,
*BSD, and Windows).
Check them out at https://github.com/secdev/scapy
I don't have a Windows PC readily at hand to test, but as far as I can tell, there is no problem capturing Ethernet frames in Wireshark on Windows, from a USB-Ethernet adapter.
What you can't do, is capturing USB bus traffic, but that is not what you wanted, right?
To clarify, just select the USB-Ethernet device as you would any other, and you are set.