Can I have more than 32 netlink sockets in kernelspace? - sockets

I have several kernel modules which need to interact with userspace. Hence, each module has a Netlink socket.
My problem is that these sockets interfere with each other. This is because all of them register to the same Netlink address family (because there aren't many available to begin with - the max is 32 and more than half are already reserved) and also because they all bind themselves to the same pid (the kernel pid - zero).
I wish there were more room for address families. Or, better yet, I wish I could bind my sockets to other pids. How come Netlink is the preferred user-kernel channel if only 32 sockets can be open at any one time?
libnl-3's documentation says
The netlink address (port) consists of a 32bit integer. Port 0 (zero) is reserved for the kernel and refers to the kernel side socket of each netlink protocol family. Other port numbers usually refer to user space owned sockets, although this is not enforced.
That last claim seems to be a lie right now. The kernel uses a constant as pid and doesn't export more versatile functions:
if (netlink_insert(sk, 0))
goto out_sock_release;
I guess I can recompile the kernel and increase the address family limit. But these are kernel modules; I shouldn't have to do that.
Am I missing something?

No.
Netlink's socket count limit is why Generic Netlink exists.
Generic Netlink is a layer on top of stock Netlink. Instead of opening a socket, you register a callback on an already established socket, and listen to messages directed to a "sub"-family there. Given there are more available family slots (1023) and no ports, I'm assuming they felt a separation between families and ports was unnecessary at this layer.
To register a listener in kernelspace, use genl_register_family() or its siblings. In userspace, Generic Netlink can be used via libnl-3's API (though it's rather limited, but the code speaks a lot and is open).

You are confused by MAX_LINKS variable name. It is not a "maxumum amount of links", it's a "maximum amount of families". The things you listed are netlink families or IOW netlink groups. There are indeed 32 families. Each family dedicated to serve some particular purpose. For example NETLINK_SELINUX is for SELinux notification and NETLINK_KOBJECT_UEVENT is for kobject notifications (these are what udev handles).
But there are no restrictions on number of sockets for each of the family.
When you call netlink_create it's checking your protocol number which in case of netlink socket is netlink family like NETLINK_SELINUX. Look at the code
static int netlink_create(struct net *net, struct socket *sock, int protocol,
int kern)
{
...
if (protocol < 0 || protocol >= MAX_LINKS)
return -EPROTONOSUPPORT;
...
This is how your MAX_LINKS is using.
Later, when to actually create socket it invokes __netlink_create, which in turn calls sk_alloc, which in turn calls sk_prot_alloc. Now, in sk_prot_alloc it allocates socket by kmallocing (netlink doesn't have its own slab cache):
slab = prot->slab;
if (slab != NULL) {
sk = kmem_cache_alloc(slab, priority & ~__GFP_ZERO);
if (!sk)
return sk;
if (priority & __GFP_ZERO) {
if (prot->clear_sk)
prot->clear_sk(sk, prot->obj_size);
else
sk_prot_clear_nulls(sk, prot->obj_size);
}
} else
sk = kmalloc(prot->obj_size, priority);

Related

Explain line "s = socket(res->ai_family, res->ai_socktype, res->ai_protocol)"

int s;
struct addrinfo hints, *res;
getaddrinfo("www.example.com", "http", &hints, &res);
s = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
Please explain the last line of code
The notion of socket is a very generic communication means.
It could deal with communication between local processes, communication between your process and some internal aspects of your system's kernel (events...), communication through the network...
Even when it deals with the network, there exists many protocol families and many protocols.
That's why, when we create a socket (with the socket() call on your last line), we have to provide several parameters in order to select the right properties of the required socket.
man 2 socket mainly explains the first parameter (domain or protocol family) but the other parameters are explained in subsequent pages since they depend on the choice made with this first parameter.
Note that once a socket is obtained with the socket() call, you may need to provide many other settings by other system calls, depending on your intention (bind() for a server, connect() for a client... many settings exist).
In your example, it seems that you want to reach an HTTP server named www.example.com.
You could have hardcoded the fact that such a server can be reached with the AF_INET protocol family (for ipv4, or AF_INET6 for ipv6), through a TCP connection (type SOCK_STREAM, protocol 0) but the getaddrinfo() function can help provide all these details and some other to be used in subsequent system calls (IP address and port number to be specified in a subsequent connect() call for example).
All this information stands in the members of the returned struct addrinfo.

What's the difference between endpoint and socket?

Almost every definition of socket that I've seen, relates it very closely to the term endpoint:
wikipedia:
A network socket is an internal endpoint for sending or receiving data
at a single node in a computer network. Concretely, it is a
representation of this endpoint in networking software
This answer:
a socket is an endpoint in a (bidirectional) communication
Oracle's definition:
A socket is one endpoint of a two-way communication link between two
programs running on the network
Even stackoverflow's definition of the tag 'sockets' is:
An endpoint of a bidirectional inter-process communication flow
This other answer goes a bit further:
A TCP socket is an endpoint instance
Although I don't understand what "instance" means in this case. If an endpoint is, according to this answer, a URL, I don't see how that can be instantiated.
"Endpoint" is a general term, including pipes, interfaces, nodes and such, while "socket" is a specific term in networking.
IMHO - logically (emphasis added) "socket" and "endpoint" are same, because they both are concatenation of an Internet Address with a TCP port. Strictly technically speaking in core-networking, there is nothing like "endpoint", there is only "socket". Go on, read more below...
As #Zac67 highligted, "socket" is a very specific term in networking - if you read TCP RFC (https://www.rfc-editor.org/rfc/rfc793) then you won't find even a single reference of "endpoint", it only talks about "socket". But when you come out of RFC world, you will hear a lot about "endpoint".
Now, they both talk about combination of IP address and a TCP port, but you can't say someone that "please give me socket of your application", you will say "please give me endpoint of your application". So, IMHO the way someone can understand difference between Socket and Endpoint is - even though both refer to combination of IP address and TCP port, but you use term "socket" when you are talking in context of computer processes or in context of OS, otherwise when talking with someone in general you will use "endpoint".
I am a guy coming from embedded systems world and low level things,
Endpoint is a hardware buffer constructed at the far end of your machine, what does that mean?
YourMachine <---------------> Device
[Socket] ----------------> [Endpoint]
[Endpoint] <---------------- [Socket]
Both sockets and endpoints are endpoints but socket is an endpoint that resides on the sender which here your machine[Socket is a word used to distinguish between sender and receiver]
OK, now that we know it is a buffer, what is the relation between buffers and networking?
Windows
When you create a socket on Windows, the OS returns a handle to that socket, in fact socket is actually a kernel object, so in Windows when you create a kernel object the returned value is a handle which is used to access that object, usually handles are void* which is then casted into numerical value that Windows can understand, now that you have access to socket kernel object, all IO operations are handled in the OS kernel and since you want to communicate with external device then you have to reach kernel first and the socket is exactly doing this, in other words, socket creates a socket object in the kernel = creates an endpoint in the kernel = creates a buffer in the kernel, that buffer is used to stream data through wires later on using OS HAL(Hardware abstraction layer) and you can talk to other devices and you are happy
Now, if the other device doesn't have communication buffer = endpoint, then you can't communicate with it, even if you open a socket on your end, it has to be two way data communication = Send and Receive
Another example of accessing IO peripheral is accessing RAM (Main memory), two ways of accessing RAM, either you access process stack or access process heap, the stack is not a kernel object in fact you can access stack directly without reaching OS kernel, simply by subtracting a value from RSP(Stack pointer register), example:
; This example demonstrates how to allocate 32 contiguous bytes from stack on Windows OS
; Intel syntax
StackAllocate proc
sub rsp, 20h
ret
StackAllocate endp
Accessing heap is different, the heap is a kernel object, so when you call malloc()/new operator in your code a long call stack is called through windows code, the point is reaching RAM requires kernel help, the stack allocation above is actually not reaching RAM, all I did is subtracting a number of an existing value in RSP which is inside CPU so I did not go outside, the heap object in kernel returns a handle that Windows use to manage fragmented memory and in the end returns a void* to that memory
Hope that helped

Why use htons() to specify protocol when creating packet socket?

To create a packet socket, following socket() function call is used (socket type and protocol may be different):
socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL))
And to create a stream socket, following call is used:
socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)
My question is why use htons() to specify protocol when creating a packet socket and not when creating socket of AF_INET or AF_INET6 family? Why not use
socket(AF_INET, SOCK_XXX, htons(IPPROTO_XXX))
to create a STREAM or DATAGRAM socket as used when creating a packet socket or vice-versa. What is different with the use of the protocols in the two calls to socket() function as both the calls are used to create sockets, one for packet socket and the other for socket at TCP level?
First, like most other network parameters that are passed to the kernel (IP addresses, ports, etc), the parameters are passed in their "on-the-wire" format so that underlying software doesn't need to manipulate them before comparing/copying/transmitting/etc. (For comparison, consider that AF_PACKET and SOCK_RAW are parameters to the kernel itself -- hence "native format" is appropriate -- while the ETH_P_xxx value is generally for "comparison with incoming packets"; it just so happens that ETH_P_ALL is a special signal value saying 'capture everything'.)
Second, interpretation of the protocol is potentially different by address family. A different address family could choose to interpret the protocol in whatever form made sense for it. It just so happens that Ethernet and IP have always used big-endian (and were important/ubiquitous enough that big-endian came to be called network order).
Third, the protocol number in the AF_INET world (i.e. Internet Protocol) only occupies a single byte so it doesn't make sense to specify a byte-ordering.

netlink connector sockets

I have worked with network programming before. But this is my first foray into netlink sockets.
I have chosen to study the 'connector' type of netlink sockets. As with any kernel component, it has a user counterpart as well. The linux kernel has a sample program called ucon.c which can be used to build userspace programs based on the aforementioned connector netlink sockets.
So here I wish to pin-point parts of the program that I want to confirm my understanding of and of parts of the program that I do not follow the logic of. Enough talking. Here we go. Please correct me wherever I go astray.
As far as I have understood, netlink sockets are a IPC method used to connect processes on the same machine and hence process ID is used as an identifier. And since netlink messages can be ideally multicast, another identifier that is needed by the netlink socket is the message group. All components that are connected to the same message group are in fact related. So while in case of IPv4, we use a sockaddr_in in place of the sockaddr, here we use a sockaddr_nl which contains the above mentioned identifiers.
Now, since we are not going to use the TCP/IP stack of the kernel, in case of netlink messages, netlink packets can be considered to be raw (please correct me here if I am wrong). Hence the only encapsulation that the netlink packet goes through is the netlink message header defined as nlmsghdr.
Now coming on to our program ucon, main() first creates a NETLINK family socket with the connector protocol. Then it fills up the aforementioned netlink socketaddress structure with the relevant information. In order to be a little experimental here, I have added an entry in the connector.h file. Now here comes my first question.
A connector message has a certain type defined in connector.h. Now this connector message structure is something that is completely internal to netlink right? As in, as far as netlink is concerned, this is all but payload. Right?
Moving on, what exactly does the nl-group field mean within the netlink message header structure? The definition does not really contain an element of this name. So are we using overlay techniques to fill certain fields of the netlink message header? And if so, what exactly is the correspondence? I cannot seem to find it anywhere.
So after binding the socket address to the socket, it is sending 10,000 unique pieces of connector based data, which as far as netlink is concerned, is pure payload. But what is strange as far as these messages are concerned is, that all of them seem to have the same sequence number.
Moving on, we find ourselves in the netlink_send subroutine to send these packets via the socket that we are bound to above. This subroutine uses a variety of netlink helper macros to manipulate the data to send. As we say above, the main() function sends 10,000 pieces of data, each of whom is zero-length and requires no acknowledgement, since the ack field is 0 (please correct me if I am wrong here). So each 'packet' is nothing but a connector message header without anything in it. Right?
Now what is surprising is that the netlink_Send function uses the same sequence number as the main() since it is a global variable. However, after the post increment in main(), it is now '1'. So basically our netlink talk is starting with a sequence number of '1'. Is that fine?
Looking into some of the helper macros defined in linux/netlink.h, I will try to summarize my understanding of the ones that are directly or indirectly being used in this program.
#define NLMSG_LENGTH(len) ((len)+NLMSG_ALIGN(NLMSG_HDRLEN))
So this macro will first align the netlink message header length and then add the payload length to it. For our case the netlink payload is a connector header without any payload of its own.
In our case, this micro is used like so
nlh->nlmsg_len = NLMSG_LENGTH(size - sizeof(*nlh));
Here, what I do not understand is the actual payload of the netlink message. In the above case, it is the size of the connector message header (since the connector message itself contains no payload of its own) minus the pointer (which is pointing to the first byte of the netlink message and thereby the netlink message header). And this pointer is (like any other pointer variable) equal to the machine word size which in my case is 4 bytes. Why are we substracting this from the connector message header?
After that, we send the message over this netlink socket just like any other IPv4 socket. hope to hear from you fellows out there with regards to the above mentioned questions. Including some sentences before the actual quesion would help as my post is rather long. But I hope it would be useful to people more than just myself.
Regards.

Why do bind() and accept() let you specify the size of the struct?

bind() and accept() let you specify the size of the struct in the 2nd parameter. But I've only seen the size of the whole struct being passed. Why do they make you specify the size? Are there any instances where you would use a different number?
Different socket protocol families use different types of structures. For example, TCP and UDP sockets using IPv4 addresses utilize a sockaddr_in structure, which is 16 bytes in size, whereas IPv6 addresses utilize a sockaddr_in6 structure instead, which is 28 bytes.
The size of the sockaddr struct can vary, for instance depending on if you use IPv4 or IPv6.
The size is specified because these are system calls, which execute in kernel mode and the kernel's address space, and the kernel doesn't otherwise know how much data to copy between kernel address space and user address space. It can't see for example whether you are using an IPv4 or IPv6 address structure.
The size could depend on the implementation, the type of socket and/or platform. So if you pass this size with the call the same code would work on different platforms, no matter what extra fields or padding is used.
This is due to historical reasons: poor man's function overloading to accept different types of socket addresses, like IPv4, UNIX, IPv6. See page 68 of UNIX Network Programming: The sockets networking API for more details.