Some questions related to IPv6 - webserver

I am studying a simple web server using c, and came up with some of these questions. How does IPv6 used in TCP? To use IPv6, do we have to use some form of modified version of TCP?? If we have to used the modified version of TCP, what do we have to change?? I think I read about Little Endian, as well as Big Endian, but I am not sure if there should be some special cases for IPv6.

As you'll probably be wanting the more gory details of the API changes, it's here: http://www.faqs.org/rfcs/rfc2553.html
Mostly it's a couple of longer address structures to pass in that can take a longer number and a new Family and Protocol name specified so the API can destiguish which struct you are using. Byte ordering is the same.
The actual TCP SYN, SYN/ACK, ACK stuff and all that is identical, it is literally a different IP layer frame with a longet number and other changes.

Related

How do I get useful data from a UDP socket using GNAT.Sockets in Ada?

Summary:
I am writing a server in Ada that should listen and reply to messages received over UDP. I am using the GNAT.Sockets library and have created a socket and bound it to a port. However, I am not sure how to listen for and receive messages on the socket. The Listen_Socket function is for TCP sockets and it seems that using Stream with UDP sockets is not recommended. I have seen the receive_socket and receive_vector procedures as alternatives, but I am not sure how to use them or how to convert the output to a usable format.
More details:
I am writing a server that should reply to messages that it gets over UDP. A minimal example of what I have so far would look like this:
with GNAT.Sockets;use GNAT.Sockets;
procedure udp is
sock: Socket_Type;
family: Family_Type:=Family_Inet;
port: Port_Type:=12345;
addr: Sock_Addr_Type(family);
begin
Create_Socket(sock,family,Socket_Datagram);
addr.Addr:=Any_Inet_Addr;
addr.Port:=port;
Bind_Socket(sock,addr);
-- Listen_Socket(sock); -- A TCP thing, not for UDP.
-- now what?
end UDP;
For a TCP socket, I can listen, accept, then use the Stream function to get a nice way to read the data (as in 'Read and 'Input). While the Stream function still exists, I have found an archive of a ten year old comp.lang.ada thread in which multiple people say not to use streams with UDP.
Looking in g-socket.ads, I do see alternatives: the receive_socket and receive_vector procedures. However, the output of the former is a Stream_Element_Array (with an offset indicating the length), and the latter has something similar, just with some kind of length associated with each Stream_Element.
According to https://stackoverflow.com/a/40045312/7105391, the way to change these types into a stream, is to not get them in the first place, and instead get a stream, which is not particularly helpful here.
Over at this github gist I found , Unchecked_Conversion is being used to turn the arrays into strings and vice versa, but given that the reference manual (13.13.1) says that type Stream_Element is mod <implementation-defined>;, I'm not entirely comfortable using that approach.
All in all, I'm pretty confused about how I'm supposed to do this. I'm even more confused about the lack of examples online, as this should be a pretty basic thing to do.

SSL vs BIO object in openSSL [duplicate]

I've been reading a lot about OpenSSL, specifically the TLS and DTLS APIs. Most of it makes sense, it's a pretty intuitive API once you understand it. One thing has really got me scratching my head though...
When/why would I use BIOs?
For example, this wiki page demonstrates setting up a barebones TLS server. There isn't even a mention of BIOs anywhere in the example.
Now this page Uses BIOs exclusively, not ever using the read and write functions of the SSL struct. Granted it's from 2013, but it's not the only one that uses BIOs.
To make it even more confusing this man page suggests that the SSL struct has an "underlying BIO" without ever needing to set it explicitly.
So why would I use BIOs if I can get away with using SSL_read() and SSL_write()? What are the advantages? Why do some examples use BIOs and others don't? What Is the Airspeed Velocity of an Unladen Swallow?
BIO's are always there, but they might be hidden by the simpler interface. Directly using the BIO interface is useful if you want more control - with more effort. If you just want to use TLS on a TCP socket then the simple interface is usually sufficient. If you instead want to use TLS on your own underlying transport layer or if you want have more control on how it interacts with the transport layer then you need BIO.
An example for such a use case is this proposal where TLS is tunneled as JSON inside HTTPS, i.e. the TLS frames are encoded in JSON and which is then transferred using POST requests and responses. This can be achieved by handling the TLS with memory BIO's which are then encoded to and decoded from JSON.
First, your Q is not very clear. SSL is (a typedef for) a C struct type, and you can't use the dot operator on a struct type in C, only an instance. Even assuming you meant 'an instance of SSL', as people sometimes do, in older versions (through 1.0.2) it did not have members read and write, and in 1.1.0 up it is opaque -- you don't even know what its members are.
Second, there are two different levels of BIO usage applicable to the SSL library. The SSL/TLS connection (represented by the SSL object, plus some related things linked to it like the session) always uses two BIOs to respectively send and receive protocol data -- including both protocol data that contains the application data you send with SSL_write and receive with SSL_read, and the SSL/TLS handshake that is handled within the library. Much as Steffen describes, these normally are both set to a socket-BIO that sends to and receives from the appropriate remote host process, but they can instead be set to BIOs that do something else in-between, or even instead. (This normal case is automatically created by SSL_set_{,r,w}fd which it should be noted on Windows actually takes a socket handle -- but not any other file handle; only on Unix are socket descriptors semi-interchangeable with file descriptors.)
Separately, the SSL/TLS connection itself can be 'wrapped' in an ssl-BIO. This allows an application to handle an SSL/TLS connection using mostly the same API calls as a plain TCP connection (using a socket-BIO) or a local file, as well as the provided 'filter' BIOs like a digest (md) BIO or a base64 encoding/decoding BIO, and any additional BIOs you add. This is the case for the IBM webpage you linked (which is for a client not a server BTW). This is similar to the Unix 'everything is (mostly) a file' philosophy, where for example the utility program grep, by simply calling read on fd 0, can search data from a file, the terminal, a pipe from another program, or (if run under inetd or similar) from a remote system using TCP (but not SSL/TLS, because that isn't in the OS). I haven't encountered many cases where it is particularly beneficial to be able to easily interchange SSL/TLS data with some other type of source/sink, but OpenSSL does provide the ability.

what is parameter level in getsockopt?

I got the following link: SOL_SOCKET in getsockopt()
But it is really confusing for me. One replied that the SOL_SOCKET means the socket layer. What is the socket layer? Are there any other options available for that parameter?
What happens if we pass the SOL_SOCKET parameter and what does the SOL stand for?
I am using UNIX.
"socket layers" refers to the socket abstraction of the operative system. Those options can be set independently of the type of socket you are handling. In practice, you may be only interested in TCP/IP sockets, but there are also UDP/IP sockets, Unix domain sockets, and others. The options related to SOL_SOCKET can be applied to any of them. The list provided in the answer of the other question has some of them; in the manual page of sockets there are even more, under the "Socket options" section.
SOL_SOCKET is a constant for the "protocol number" associated with that level. For other protocols or levels, you can use getprotoent to obtain the protocol number from its name, or check the manual of the protocol - for example, in the manual page of IP are described the constants for the protocol numbers of IP (IPPROTO_IP), TCP (IPPROTO_TCP) and UDP (IPPROTO_UDP), while the manual page of Unix sockets says that, for historial reasons, its protocol options must be set using SOL_SOCKET too. Moreover, you can find the list of supported protocols for your system in /etc/protocols. And, of course, the options supported by each of the protocols is in their manuals: IP, TCP, UDP, Unix sockets...

getting local host name as destination using getaddrinfo/getnameinfo

I am looking through some out-of-date code which uses getaddrinfo and getnameinfo to determine host name information and then falls back to gethostname and gethostbyname if getnameinfo fails.
Now, this seems wrong to me. I am trying to understand the intent of the code so that I can make a recommendation. I don't want to repost the entire code here because it is long and complicated, but I'll try to summarize:
As far as I can tell, the point of this code is to generate a string which can be used by another process to connect to a listening socket. This seems to be not just for local processes, but also for remote hosts to connect back to this computer.
So the code in question is basically doing the following:
getaddrinfo(node = NULL, service = port, hints.ai_flags = AI_PASSIVE, ai); -- this gets a list of possible arguments for socket() that can be used with bind().
go through the list of results and create a socket.
first time a socket is successfully created, this is selected as the "used" addrinfo.
for the ai_addr of the selected addrinfo, call getnameinfo() to get the associated host name.
if this fails, call gethostname(), then look up gethostbyname() on the result.
There are a few reasons I think this is wrong, but I want to verify my logic. Firstly, it seems from some experiments that getnameinfo() pretty much always fails here. I suppose that the input address is unknown, since it is a listening socket, not a destination, so it doesn't need a valid IP from this point of view. Then, calling gethostname() and passing the result to gethostbyname() pretty much always returns the same result as gethostname() by itself. In other words, it's just verifying the local host name, and seems pointless to me. This is problematic because it's not even necessarily usable by remote hosts, is it?
Somehow I think it's possible that the whole idea of trying to determine your own host name on the subnet is not that useful, but rather you must ping a message to another host and see what IP address they see it as. (Unfortunately in this context that doesn't make sense, since I don't know other peers at this level of the program.) For instance, the local host could have more than one NIC and therefore multiple IP addresses, so trying to determine a single host-address pair is nonsensical. (Is the correct resolution to just bind() and simultaneously listen on all addrinfo results?)
I also noticed that one can get names resolved by just passing them in to getaddrinfo() and setting the AI_CANONNAME flag, meaning the getnameinfo() step may be redundant. However, I guess this is not done here because they are trying to determine some kind of unbiased view of the hostname without supplying it apriori. Of course, it fails, and they end up using gethostname() anyways! I also tried supplying "localhost" to getaddrinfo(), and it reports in ai_canonname` the host name under Linux, but just results in "localhost" on OS X, so not so useful since this is supposed to be cross-platform.
I guess to summarize, my question is, what is the correct way, if one exists, to get a local hostname that can be announced to subnet peers, in modern socket programming? I am leaning towards replacing this code with simply returning the results of gethostname(), but I'm wondering if there's a more appropriate solution using modern calls like getaddrinfo().
If the answer is that there's no way to do this, I'll just have to use gethostname() anyways since I must return something here, or it would break the API.
If I read this correctly, you just want to get a non-localhost socket address that is likely to succeed for creating a local socket, and for a remote host to connect back on.
I have a function that I wrote that you can reference called "GetBestAddressForSocketBind". You can get it off my GitHub project page here. You may need to reference some of the code in the parent directory.
The code essentially just uses getifaddrs to enumerate adapters and picks the first one that is "up", not a loopback/local and has an IP address of the desired address family (AF_INET or AF_INET6).
Hope this helps.
I think that you should look at Ulrich Drepper's article about IPv6 programming. It is relatively short and may answer on some of your concerns. I found it really useful. I'm posting this link, because it is very difficult to answer to your question(s) without (at least) pseudo-code.

Socket Protocol Fundamentals

Recently, while reading a Socket Programming HOWTO the following section jumped out at me:
But if you plan to reuse your socket for further transfers, you need to realize that there is no "EOT" (End of Transfer) on a socket. I repeat: if a socket send or recv returns after handling 0 bytes, the connection has been broken. If the connection has not been broken, you may wait on a recv forever, because the socket will not tell you that there's nothing more to read (for now). Now if you think about that a bit, you'll come to realize a fundamental truth of sockets: messages must either be fixed length (yuck), or be delimited (shrug), or indicate how long they are (much better), or end by shutting down the connection. The choice is entirely yours, (but some ways are righter than others).
This section highlights 4 possibilities for how a socket "protocol" may be written to pass messages. My question is, what is the preferred method to use for real applications?
Is it generally best to include message size with each message (presumably in a header), as the article more or less asserts? Are there any situations where another method would be preferable?
The common protocols either specify length in the header, or are delimited (like HTTP, for instance).
Keep in mind that this also depends on whether you use TCP or UDP sockets. Since TCP sockets are reliable you can be sure that you get everything you shoved into them. With UDP the story is different and more complex.
These are indeed our choices with TCP. HTTP, for example, uses a mix of second, third, and forth option (double new-line ends request/response headers, which might contain the Content-Length header or indicate chunked encoding, or it might say Connection: close and not give you the content length but expect you to rely on reading EOF.)
I prefer the third option, i.e. self-describing messages, though fixed-length is plain easy when suitable.
If you're designing your own protocol then look at other people's work first; there might already be something similar out there that you could either use 'as is' or repurpose and adjust. For example; ISO-8583 for financial txns, HTTP or POP3 all do things differently but in ways that are proven to work... In fact it's worth looking at these things anyway as you'll learn a lot about how real world protocols are put together.
If you need to write your own protocol then, IMHO, prefer length prefixed messages where possible. They're easy and efficient to parse for the receiver but possibly harder to generate if it is costly to determine the length of the data before you begin sending it.
The decision should depend on the data you want to send (what it is, how is it gathered). If the data is fixed length, then fixed length packets will probably be the best. If data can be easily (no escaping needed) split into delimited entities then delimiting may be good. If you know the data size when you start sending the data piece, then len-prefixing may be even better. If the data sent is always single characters, or even single bits (e.g. "on"/"off") then anything different than fixed size one character messages will be too much.
Also think how the protocol may evolve. EOL-delimited strings are good as long as they do not contain EOL characters themselves. Fixed length may be good until the data may be extended with some optional parts, etc.
I do not know if there is a preferred option. In our real-world situation (client-server application), we use the option of sending the total message length as one of the first pieces of data. It is simple and works for both our TCP and UDP implementations. It makes the logic reasonably "simple" when reading data in both situations. With TCP, the amount of code is fairly small (by comparison). The UDP version is a bit (understatement) more complex but still relies on the size that is passed in the initial packet to know when all data has been sent.