"nfds" parameter in select() in sockets programming - sockets

I never really got the idea behind this parameter, what is it good for? I also noticed this parameter is ignored in WinSock2, why is that?
Do Unix systems use this parameter or do they ignore it as well?

Windows' implementation of select() uses linked lists internally, so it doesn't need to use the nfds parameter for anything.
On other OS's, however, the fd_set struct is implemented to hold an array of bits (one bit per socket). For example, here is how it is declared (in sys/_types/_fd_def.h) under MacOS/X:
typedef struct fd_set {
__int32_t fds_bits[__DARWIN_howmany(__DARWIN_FD_SETSIZE, __DARWIN_NFDBITS)];
} fd_set;
... and in order to do the right thing, the select() call will have to loop over the bits in the array to see what they contain. By supplying select() with the nfds parameter, we tell the select() implementation that it only needs to iterate over the first (nfds) bits of the array, rather than always having to iterate over the entire array on every call. This allows select() to be more efficient than it would otherwise be.

Related

Is it possible to tail call eBPF codes that use different modes?

Is it possible to tail call eBPF codes that use different modes?
For example, if I coded a code that printk("hello world") using kprobe,
would I be able to tail call a XDP code afterwards or vice versa?
I programmed something on eBPF that uses a socket buffer and seems like when I try to tail call another code that uses kprobe, it doesn't load the program.
I wanted to tail call a code that uses XDP_PASS after using a BPF.SOCKET_FILTER mode but seems like tail call isn't working.
I've been trying to figure this out but I can't find any documentations regarding tail calling codes that use different modes :P
Thanks in advance!
No, it is not.
Have a look at kernel commit 04fd61ab36ec, which introduced tail calls: the comment in the first piece of code (in internal kernel header bpf.h), defining the struct bpf_array, sets a owner_prog_type member, and explains the following in a comment:
/* 'ownership' of prog_array is claimed by the first program that
* is going to use this map or by the first program which FD is stored
* in the map to make sure that all callers and callees have the same
* prog_type and JITed flag
*/
So once the program type associated with a BPF program array, used for tail calls, has been defined, it is not possible to use it with other program types. Which makes sense, since different program types work with different context (packet data VS traced function context VS ...), can use different helpers, have return functions with different meanings, necessitate different checks from the verifier, ... So it's hard to see how jumping from one type to another would work. How could you start with processing a network packet, and all of a sudden jump to a piece of code that is supposed to trace some internals of the kernel? :)
Note that it is also impossible to mix JIT-ed and non-JIT-ed programs, as indicated by the owner_jited of the struct.

Is it a pure function?

I have following function:
def timestamp(key: String)
: String
= Monoid.combine(key, Instant.now().getEpochSecond.toString)
and wanted to know, if it is pure or not? A pure function for me is, given the same input returns always the same output. But the function above, given always the same string will returns another string with another time, that it is in my opinion not pure.
No, it's not pure by any definition I know of. A good discussion of pure functions is here: https://alvinalexander.com/scala/fp-book/definition-of-pure-function. In Alvin's definition of purity he says:
A pure function has no “back doors,” which means:
...
It cannot depend on any external I/O. It can’t rely on input from files, databases, web services, UIs, etc; it can’t produce output, such as writing to a file, database, or web service, writing to a screen, etc.
Reading the time of the current system uses I/O so it is not pure.
You are right, it is not a pure function as it returns different result for the same arguments. Mathematically speaking it is not a function at all.
Definition of Pure function from Wikipedia
The function always evaluates the same result value given the same argument value(s). The function result value cannot depend on any hidden information or state that may change while program execution proceeds or between different executions of the program, nor can it depend on any external input from I/O devices (usually—see below).
Evaluation of the result does not cause any semantically observable side effect or output, such as mutation of mutable objects or output to I/O devices (usually—see below).

New to socket programming, questions regarding "select()"

Currently in my degree we're starting to work with sockets.
I Have a couple of questions regarding polling for input from sockets,
using the select() function.
int select( int nfds,
fd_set *readfds,
fd_set *writefds,
fd_set *exceptfds,
const struct timespec *timeout);
We give select "nfds" param, which would normally would
be the maximum sockets number we would like to monitor. How can i watch only one specific socket instead of the range of 0 to nfds_val sockets ?
What are the file descriptors objects that we use? what is their purpose,
and why can't we just point "select" to the relevant socket structure?
I've read over the forum regarding Blocking and Non-Blocking mode of select, but couldn't understand the meaning or uses of each, nor how to implement such, would be glad if someone could explain.
Last but not least (only for the time being :D ) - When binding a socketaddr_in to socket number, why does one needs to cast to socketaddr * and not leave it as sockaddr_in * ?
I mean except for the fact that bind method expects this kind of pointer type ;)
Would appreciate some of the experts answers here :)
Thank you guys and have a great week!
We give select "nfds" param, which would normally would be the maximum sockets number we would like to monitor. How can i watch only one specific socket instead of the range of 0 to nfds_val sockets ?
Edit. (sorry, the previous text here was wrong) Just provide your socket descriptor + 1. I'm pretty sure it doesn't mean OS will check all the descriptors in [0, 1... descriptor] range.
What are the file descriptors objects that we use? what is their purpose, and why can't we just point "select" to the relevant socket structure?
File descriptors are usually integer values given to the user by OS. OS uses descriptors to control physical and logical resources - one file descriptor means OS has given you something file-like to control. Since Berkeley Sockets have read and write operations defined, they are file-like and socket objects essentially are plain file descriptors.
Answering why can't we just point "select" to the relevant socket structure? - we actually can. What exactly to pass to select depends on OS and language. In C you place your socket descriptor (plain int value most probably) into a fd_set. fd_set is then passed to select.
Edit.
An tiny example for Linux:
fd_set set;
FD_ZERO(&set);
FD_SET(socket_fd, &set);
// check if socket_fd is ready for reading
result = select(socket_fd + 1, &set, NULL, NULL, NULL);
if (result == -1) report_error(errno);
Docs.
Windows has similar code.
I've read over the forum regarding Blocking and Non-Blocking mode of select, but couldn't understand the meaning or uses of each, nor how to implement such, would be glad if someone could explain.
A blocking operation makes your thread wait until it is done. It's 99% of functions you use. If there are sockets ready for some IO, blocking select will return something immediately. It there are no such sockets, the thread will wait for them. Non-blocking select, in the latter case, won't wait and will return -1 (error).
As an example, try to implement single threaded server that is capable of working with multiple clients, including long operations like file transfer happening simultaneously. You definitely don't want to use blocking socket operations in this case.
Last but not least (only for the time being :D ) - When binding a socketaddr_in to socket number, why does one needs to cast to socketaddr * and not leave it as sockaddr_in * ? I mean except for the fact that bind method expects this kind of pointer type ;)
Probably due to historical reasons, but I'm not sure. And there seems to be a fine answer on SO already.

NSInvocation needing NSMethodSignature

I have been wondering for a couple of days if NSInvocation should need the NSMethodSignature.
Lets say we want to write our own NSInvocation, my requirements would be as so:
I need a selector SEL
The target object to call the selector on
The argument array
Then i would get the IMP out from the target and the SEL, and pass the argument as parameters.
So, my question is, why do we need an NSMethodSignature to construct and use an NSInvocation?
Note: I do know that by having only a SEL and a target, we dont have the arguments and return type for this method, but why would we care about the types of the args and returns?
Each type in C has a different size. (Even the same type can have different sizes on different systems, but we'll ignore that for now.) int can have 32 or 64 bits, depending on the system. double takes 64 bits. char represents 8 bits (but might be passed as a regular int depending on the system's passing convention). And finally and most importantly, struct types have various sizes, depending on how many elements are in it and each of their sizes; there is no bound to how big it can be. So it is impossible to pass arguments the same way regardless of type. Therefore, how the calling function arranges the arguments, and how the called function interprets its arguments, must be dependent on the function's signature. (You cannot have a type-agnostic "argument array"; what would be the size of the array elements?) When normal function calls are compiled, the compiler knows the signature at compile time, and can arrange it correctly according to calling convention. But NSInvocation is for managing an invocation at runtime. Therefore, it needs a representation of the method signature to work.
There are several things that the NSInvocation can do. Each of those things requires knowledge of the number and types (at least the sizes of the types) of the parameters:
When a message is sent to an object that does not have a method for it, the runtime constructs an NSInvocation object and passes it to -forwardInvocation:. The NSInvocation object contains a copy of all the arguments that are passed, since it can be stored and invoked later. Therefore, the runtime needs to know, at the very least, how big the parameters in total are, in order to copy the right amount of data from registers and/or stack (depending on how arguments are arranged in the calling convention) into the NSInvocation object.
When you have an NSInvocation object, you can query for the value of the i'th argument, using -getArgument:atIndex:. You can also set/change the value for the i'th argument, using -setArgument:atIndex:. This requires it to know 1) Where in its data buffer the i'th parameter starts; this requires knowing how big the previous parameters are, and 2) how big the i'th parameter is, so that it can copy the right amount of data (if it copies too little, it will have a corrupt value; if it copies too much, say, when you do getArgument, it can overwrite the buffer you gave it; or when you do setArgument, overwrite other arguments).
You can have it do -retainArguments, which causes it to retain all arguments of object pointer type. This requires it to distinguish between object pointer types and other types, so the type information must include not only the size.
You can invoke the NSInvocation, which causes it to construct and execute the call to the method. This requires it to know, at the very least, how much data to copy from its buffer into the registers/stack to place all the data where the function will expect it. This requires knowing at least the combined size of all parameters, and probably also needs to know the sizes of individual parameters, so that the divide between parameters on registers and parameters on stack can be figured out correctly.
You can get the return value of the call using -getReturnValue:; this has similar issues to the getting of arguments above.
A thing not mentioned above is that the return type may also have a great effect on the calling mechanism. On x86 and ARM, the common architectures for Objective-C, when the return type is a struct type, the calling convention is very different -- effectively an additional (first) parameter is added before all the regular parameters, which is a pointer to the space that the struct result should be written. This is instead of the regular calling convention where the result is returned in a register. (In PowerPC I believe that double return type is also treated specially.) So knowing the return type is essentially for constructing and invoking the NSInvocation.
NSMethodSignature is required for the message sending and forwarding mechanism to work properly for invocations. NSMethodSignature and NSInvocation were built as a wrapper around __builtin_call(), which is both architecture dependent, and extremely conservative about the stack space a given function requires. Hence, when invocations are invoked, __builtin_call() gets all the information it needs from the method signature, and can fail gracefully by throwing the call out to the forwarding mechanism knowing that it, too, receives the proper information about how the stack should look for re-invocation.
That being said, you cannot make a primitive NSInvocation without a method signature without modifying the C language to support converting arrays to VARARGS seeing as objc_msgSend() and its cousins won't allow it. Even if you could get around that, you'd need to calculate the size of the arguments and the return type (not too hard, but if you're wrong, you're wrong big time), and manage a proper call to __builtin_call(), which would require an intimate knowledge of the message sending architecture or an ffi (which probably drops down to __builtin_call() anyhow).

How to use the select() function in socket programming?

The prototype is:
int select (int nfds,
fd_set *read-fds,
fd_set *write-fds,
fd_set *except-fds,
struct timeval *timeout);
I've been struggling to understand this function for quite some time. My question is, if it checks all the file descriptors from 0 to nfds-1, and will modify the read-fds, write-fds and except-fds when return, why do I need to use FD_SET to add file descriptors to the set at the begining, it will check all the file descriptors anyway, or not?
It won't check from 0 to nfds-1. The first argument just provides an upper bound on how large, numerically, the file descriptors used are. This is because the set itself might be represented as a bitvector, without a way to know how many bits are actually used. Specifying this as a separate argument helps select() avoid checking file descriptors that are not in use.
Also, a descriptor that is not in e.g. the read set when you call select() is not being checked at all, so it cannot appear in the set when the call returns, either.
I once had the same doubt as yours. You can look at following question and answers:
Query on Select System Call