Where does kevent() validate its file descriptor? - system-calls

Let's say we have a simple program like this:
int fd;
kq = kqueue();
EV_SET(&kev, fd, EVFILT_VNODE, EV_ADD, 0, 0, NULL);
kevent(kq, &kev, 1, &ke, 1, &timeout)
Where does kevent check to see if fd is a valid file descriptor? It might find out that the file descriptor denoted by fd is closed.

I'm not going to cover how a syscall happens in freebsd (you may owe the oracle another question), however, sys_kevent() (/usr/src/sys/kern/kern_event.c) is called and it calls kern_kevent() (same file). In the code below, fget(fd) checks your file descriptor (for exsiting, permissions, etc. Probably locks it, too) before handing back a file pointer.
int
kern_kevent(struct thread *td, int fd, int nchanges, int nevents,
struct kevent_copyops *k_ops, const struct timespec *timeout)
{
cap_rights_t rights;
struct file *fp;
int error;
cap_rights_init(&rights);
if (nchanges > 0)
cap_rights_set(&rights, CAP_KQUEUE_CHANGE);
if (nevents > 0)
cap_rights_set(&rights, CAP_KQUEUE_EVENT);
error = fget(td, fd, &rights, &fp);
if (error != 0)
return (error);
error = kern_kevent_fp(td, fp, nchanges, nevents, k_ops, timeout);
fdrop(fp, td);
return (error);
}
Heh... at what point did I become the UTSL nazi? Sigh.

Related

How can we determine whether a socket is ready to read/write?

How can we determine whether a socket is ready to read/write in socket programming.
On Linux, use select() or poll().
On Windows, you can use WSAPoll() or select(), both from winsock2.
Mac OS X also has select() and poll().
#include <sys/select.h>
int select(int nfds, fd_set *readfds, fd_set *writefds,
fd_set *exceptfds, struct timeval *timeout);
select() and pselect() allow a program to monitor multiple file descriptors, waiting until one or more of the file descriptors become "ready" for some class of I/O operation (e.g., input possible). A file descriptor is considered ready if it is possible to perform the corresponding I/O operation (e.g., read(2)) without blocking. – https://linux.die.net/man/3/fd_set
#include <poll.h>
int poll(struct pollfd *fds, nfds_t nfds, int timeout);
poll() performs a similar task to select(2): it waits for one of a set of file descriptors to become ready to perform I/O.
– https://linux.die.net/man/2/poll
Example of select usage:
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <sys/types.h>
#include <unistd.h>
int
main(void)
{
fd_set rfds;
struct timeval tv;
int retval;
/* Watch stdin (fd 0) to see when it has input. */
FD_ZERO(&rfds);
FD_SET(0, &rfds);
/* Wait up to five seconds. */
tv.tv_sec = 5;
tv.tv_usec = 0;
retval = select(1, &rfds, NULL, NULL, &tv);
/* Don't rely on the value of tv now! */
if (retval == -1)
perror("select()");
else if (retval)
printf("Data is available now.\n");
/* FD_ISSET(0, &rfds) will be true. */
else
printf("No data within five seconds.\n");
exit(EXIT_SUCCESS);
}
Explanation of the above code:
FD_ZERO initializes the rfds set. FD_SET(0, &rfds) adds fd 0 (stdin) to the set. FD_ISSET can be used to check whether a specific file descriptor is ready after select returns.
The select call in this example waits until rfds has input or until 5 seconds passes. The two NULLs in the select call are where file descriptor sets (fd_sets) to be checked for ready to write status and exceptions, respectively, would be passed. The tv argument is the number of seconds and microseconds to wait. The first argument to select, nfds, is the highest numbered file descriptor in any of the three sets (read, write, exceptions sets) plus one.
Example of poll usage (from man7.org):
/* poll_input.c
Licensed under GNU General Public License v2 or later.
*/
#include <poll.h>
#include <fcntl.h>
#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
} while (0)
int
main(int argc, char *argv[])
{
int nfds, num_open_fds;
struct pollfd *pfds;
if (argc < 2) {
fprintf(stderr, "Usage: %s file...\n", argv[0]);
exit(EXIT_FAILURE);
}
num_open_fds = nfds = argc - 1;
pfds = calloc(nfds, sizeof(struct pollfd));
if (pfds == NULL)
errExit("malloc");
/* Open each file on command line, and add it 'pfds' array. */
for (int j = 0; j < nfds; j++) {
pfds[j].fd = open(argv[j + 1], O_RDONLY);
if (pfds[j].fd == -1)
errExit("open");
printf("Opened \"%s\" on fd %d\n", argv[j + 1], pfds[j].fd);
pfds[j].events = POLLIN;
}
/* Keep calling poll() as long as at least one file descriptor is
open. */
while (num_open_fds > 0) {
int ready;
printf("About to poll()\n");
ready = poll(pfds, nfds, -1);
if (ready == -1)
errExit("poll");
printf("Ready: %d\n", ready);
/* Deal with array returned by poll(). */
for (int j = 0; j < nfds; j++) {
char buf[10];
if (pfds[j].revents != 0) {
printf(" fd=%d; events: %s%s%s\n", pfds[j].fd,
(pfds[j].revents & POLLIN) ? "POLLIN " : "",
(pfds[j].revents & POLLHUP) ? "POLLHUP " : "",
(pfds[j].revents & POLLERR) ? "POLLERR " : "");
if (pfds[j].revents & POLLIN) {
ssize_t s = read(pfds[j].fd, buf, sizeof(buf));
if (s == -1)
errExit("read");
printf(" read %zd bytes: %.*s\n",
s, (int) s, buf);
} else { /* POLLERR | POLLHUP */
printf(" closing fd %d\n", pfds[j].fd);
if (close(pfds[j].fd) == -1)
errExit("close");
num_open_fds--;
}
}
}
}
printf("All file descriptors closed; bye\n");
exit(EXIT_SUCCESS);
}
Explanation of above code:
This code is a bit more complex than the previous example.
argc is the number of arguments. argv is the array of arguments given to the program. argc[0] is usually the name of the program. If argc is less than 2 (which means only one argument was given), the program outputs a usage message and exits with a failure code.
pfds = calloc(nfds, sizeof(struct pollfd)); allocates memory for an array of struct pollfd which is nfds elements long and zeroes the memory. Then there is a NULL check; if pfds is NULL, that means calloc failed (usually because the program ran out of memory), so the program prints the error with perror and exits.
The for loop opens each filename specified in argv and assigns it to corresponding elements of the pfd array. Then sets .events on each element to POLLIN to tell poll to check each file descriptor for whether it is ready to read
The while loop is where the actual call to poll() happens. The array of struct pollfds, pfds, the number of fds, nfds, and a timeout of -1 is passed to poll. Then the return value is checked for error (-1 is what poll return when there is an error) and if there is an error, the program prints an error message and exits. Then the number of ready file descriptors is printed.
In the second for loop inside the while loop, the program iterates over the array of pollfds and checks the .revents field of each structure. If that field is nonzero, an event occurred on the corresponding file descriptor. The program prints the file descriptor, and the event, which can be POLLIN (ready for input), POLLHUP (hang up), or POLLERR (error condition). If the event was POLLIN, the file is ready to be read.
The program then reads 10 bytes into buf. If an error happens when reading, the program prints an error and exits. Otherwise, the program prints the number of bytes read and the contents of the buffer buf.
In case of error or hang up (POLLERR, POLLHUP) the program closes the file descriptor and decrements num_open_fds.
Finally the program says that all file descriptors are closed and exits with EXIT_SUCCESS.

What does it mean when CreateNamedPipe returns of 0xFFFFFFFF perror() says "NO ERROR'?

I am using CreateNamedPipe. It returns 0XFFFFFFFF but when I call GetLastError and perror I get "NO ERROR".
I have checked https://learn.microsoft.com/en-us/windows/win32/ipc/multithreaded-pipe-server and I heve coded very similar.
I coded this using an example provided here: https://stackoverflow.com/questions/47731784/c-createnamedpipe-error-path-not-found-3#= and he says it means ERROR_PATH_NOT_FOUND (3). But my address is "\\.\pipe\pipe_com1. Note that StackOverflow seems to remove the extra slashes but you will see them in the paste of my code.
I followed the example here: Create Named Pipe C++ Windows but I still get the error. Here is my code:
// Create a named pipe
// It is used to test TcpToNamedPipe to be sore it it is addressing the named pipe
#include <windows.h>
#include <stdio.h>
#include <process.h>
char ch;
int main(int nargs, char** argv)
{
if (nargs != 2)
{
printf("Usage pipe name is first arg\n");
printf("press any key to exit ");
scanf("%c", &ch);
return -1;
}
char buffer[1024];
HANDLE hPipe;
DWORD dwRead;
sprintf(buffer, "\\\\.\\pipe\\%s", argv[1]);
hPipe = CreateNamedPipe((LPCWSTR)buffer,
PIPE_ACCESS_DUPLEX,
PIPE_TYPE_BYTE | PIPE_READMODE_BYTE | PIPE_WAIT, // FILE_FLAG_FIRST_PIPE_INSTANCE is not needed but forces CreateNamedPipe(..) to fail if the pipe already exists...
1,
1024*16,
1024*16,
NMPWAIT_USE_DEFAULT_WAIT,
NULL);
if (hPipe == INVALID_HANDLE_VALUE)
{
//int errorno = GetLastError();
//printf("error creating pipe %d\n", errorno);
perror("");
printf("press any key to exit ");
scanf("%c", &ch);
return -1;
}
while (hPipe != INVALID_HANDLE_VALUE)
{
if (ConnectNamedPipe(hPipe, NULL) != FALSE) // wait for someone to connect to the pipe
{
while (ReadFile(hPipe, buffer, sizeof(buffer) - 1, &dwRead, NULL) != FALSE)
{
/* add terminating zero */
buffer[dwRead] = '\0';
/* do something with data in buffer */
printf("%s", buffer);
}
}
DisconnectNamedPipe(hPipe);
}
return 0;
}
I'm guessing that the pointer to the address may be wrong and CreateNamedPipe is not seeing the name of the pipe properly. So I used disassembly and notice that the address is in fact a far pointer. Here is that disassembly:
00CA1A45 mov esi,esp
00CA1A47 push 0
00CA1A49 push 0
00CA1A4B push 4000h
00CA1A50 push 4000h
00CA1A55 push 1
00CA1A57 push 0
00CA1A59 push 3
00CA1A5B lea eax,[buffer]
00CA1A61 push eax
00CA1A62 call dword ptr [__imp__CreateNamedPipeW#32 (0CAB00Ch)]
Can someone spot my problem?

SCTP: What should be the sctp_status.sstate value of an SCTP socket after succesful connect() call?

I'm trying to connect to a remote peer (which I don't have directory access other than connecting to it via socket and ping) via SCTP. Assuming that I have connected succesfully, what should be the value of my sctp_status.sstate if I try calling getsocktopt()? Mine is SCTP_COOKIE_ECHOED(3) according to sctp.h. Is it correct? Shouldn't it be SCTP_ESTABLISHED?
Because I tried sending message to the remote peer with this code:
ret = sctp_sendmsg (connSock, (void *) data, (size_t) strlen (data), (struct sockaddr *) &servaddr, sizeof (servaddr), 46, 0, 0, 0, 0);
It returned the number of bytes I tried sending. Then when I tried catching if there's any response:
ret = sctp_recvmsg (connSock, (void *) reply, sizeof (reply), NULL,
NULL, NULL, &flags);
It returns -1 with errno of ECONNRESET(104). What are the possible mistakes in my code, or maybe in my flow? Did I miss something?
Thanks in advance for answering. Will gladly appreciate that. :)
Update: Here down below is my client code in connecting to the remote peer. It's actually a node addon for me to use since SCTP is not fully supported in node. Using lksctp-tools package to include the headers.
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <netinet/in.h>
#include <netinet/sctp.h>
#include <arpa/inet.h>
#include <signal.h>
#define MAX_BUFFER 1024
int connSock = 0;
int connect(char host[], int port, char remote_host[], int remote_port, int timeout) {
int ret, flags;
fd_set rset, wset;
struct sockaddr_in servaddr;
struct sockaddr_in locaddr;
struct sctp_initmsg initmsg;
struct timeval tval;
struct sctp_status status;
socklen_t opt_len;
errno = 0;
connSock = socket (AF_INET, SOCK_STREAM, IPPROTO_SCTP);
flags = fcntl(connSock, F_GETFL, 0);
fcntl(connSock, F_SETFL, flags | O_NONBLOCK);
if (connSock == -1)
{
return (-1);
}
memset(&locaddr, 0, sizeof(locaddr));
locaddr.sin_family = AF_INET;
locaddr.sin_port = htons(port);
locaddr.sin_addr.s_addr = inet_addr(host);
ret = bind(connSock, (struct sockaddr *)&locaddr, sizeof(locaddr));
if (ret == -1)
{
return (-1);
}
memset (&initmsg, 0, sizeof (initmsg));
initmsg.sinit_num_ostreams = 5;
initmsg.sinit_max_instreams = 5;
initmsg.sinit_max_attempts = 10;
ret = setsockopt(connSock, IPPROTO_SCTP, SCTP_INITMSG, &initmsg, sizeof(initmsg));
if (ret == -1)
{
return (-1);
}
memset (&servaddr, 0, sizeof (servaddr));
servaddr.sin_family = AF_INET;
servaddr.sin_port = htons (remote_port);
servaddr.sin_addr.s_addr = inet_addr (remote_host);
if((ret = connect (connSock, (struct sockaddr *) &servaddr, sizeof (servaddr))) < 0)
if (errno != EINPROGRESS)
return (-1);
if (ret == 0) {
fcntl(connSock, F_SETFL, flags);
return 0;
}
FD_ZERO(&rset);
FD_SET(connSock, &rset);
wset = rset;
tval.tv_sec = timeout;
tval.tv_usec = 0;
ret = select(connSock+1, &rset, &wset, NULL, timeout ? &tval : NULL);
if (ret == 0) {
close(connSock);
errno = ETIMEDOUT;
return(-1);
}
else if (ret < 0) {
return(-1);
}
fcntl(connSock, F_SETFL, flags);
opt_len = (socklen_t) sizeof(struct sctp_status);
getsockopt(connSock, IPPROTO_SCTP, SCTP_STATUS, &status, &opt_len);
printf ("assoc id = %d\n", status.sstat_assoc_id);
printf ("state = %d\n", status.sstat_state);
printf ("instrms = %d\n", status.sstat_instrms);
printf ("outstrms = %d\n", status.sstat_outstrms);
return 0;
}
int sendMessage(char remote_host[], int remote_port, char data[]) {
int ret, flags;
struct sockaddr_in servaddr;
char reply[1024];
errno = 0;
memset (&servaddr, 0, sizeof (servaddr));
servaddr.sin_family = AF_INET;
servaddr.sin_port = htons (remote_port);
servaddr.sin_addr.s_addr = inet_addr (remote_host);
printf("\nSending %s (%li bytes)", data, strlen(data));
ret = sctp_sendmsg (connSock, (void *) data, (size_t) strlen (data),
(struct sockaddr *) &servaddr, sizeof (servaddr), 46, 0, 0, 0, 0);
if (ret == -1)
{
printf("\nError sending errno(%d)", errno);
return -1;
}
else {
ret = sctp_recvmsg (connSock, (void *) reply, sizeof (reply), NULL,
NULL, NULL, &flags);
if (ret == -1)
{
printf("\nError receiving errno(%d)", errno);
return -1;
}
else {
printf("\nServer replied with %s", reply);
return 0;
}
}
}
int getSocket() {
return connSock;
}
I don't know if there's anything significant I need to set first before connecting that I missed out. I got the snippet from different sources so it's quite messy.
Another update, here's the tshark log of that code when executed:
3336.919408 local -> remote SCTP 82 INIT
3337.006690 remote -> local SCTP 810 INIT_ACK
3337.006727 local -> remote SCTP 774 COOKIE_ECHO
3337.085390 remote -> local SCTP 50 COOKIE_ACK
3337.086650 local -> remote SCTP 94 DATA
3337.087277 remote -> local SCTP 58 ABORT
3337.165266 remote -> local SCTP 50 ABORT
Detailed tshark log of this here.
Looks like the remote sent its COOKIE_ACK chunk but my client failed to set its state to ESTABLISHED (I double checked the sstate value of 3 here).
If the association setup processes completed the state should be SCTP_ESTABLISHED. SCTP_COOKIE_ECHOED indicated that association has not completely established. It means that originating side (your localhost in this case) has sent (once or several times) COOKIE_ECHO chunk which has not been acknowledged by COOKIE_ACK from remote end.
You can send messages in this state (SCTP will simply buffer it until it get COOKIE_ACK and resend it later on).
It is hard to say what went wrong based on information you provided. At this stage it is probably will be worth diving into wireshark trace, to see what remote side is replying on your COOKIE_ECHO.
Also if you can share your client/server side code that might help to identify the root cause.
UPDATE #1:
It should be also noted that application can abort association them self (e.g. if this association is not configured on that server). If you trying to connect to the random server (rather than your specific one) that is quite possible and actually makes sense in your case. In this case state of association on your side is COOKIE_ECHOED because COOKIE_ACK has not arrived yet (just a race condition). As I said previously SCTP happily accepts your data in this state and just buffers it until it receives COOKIE_ACK. SCTP on remote side sends COOKIE_ACK straight away, even before the application received execution control in accept(). If application decided to terminate the association in ungraceful way, it will send ABORT (that is your first ABORT in wireshark trace). Your side has not received this ABORT yet and sends DATA chunk. Since remote side considers this association as already terminated it cannot process DATA chunk, so it treats it as out of the blue (see RFC 4960 chapter 8.4) and sends another ABORT with t-bit set to 1.
I guess this is what happened in your case. You can confirm it easily just by looking into wireshark trace.

Implementation of Poll Mechanism in Char Device Driver

Hello Dear participants of stackoverflow,
I'm new to kernel space development and still in the beginning of the road.
I developed a basic char device driver that can read open close etc . But couldn't find a proper source and how to tutorial for Poll/select mechanism sample.
I've written the sample code for poll function below:
static unsigned int dev_poll(struct file * file, poll_table *wait)
{
poll_wait(file,&dev_wait,wait);
if (size_of_message > 0 ){
printk(KERN_INFO "size_of_message > 0 returning POLLIN | POLLRDNORM\n");
return POLLIN | POLLRDNORM;
}
else {
printk(KERN_INFO "dev_poll return 0\n");
return 0;
}
}
It works fine but couldn't undestand a few things.
When I call select from user space program as
struct timeval time = {5,0 } ;
select(fd + 1 , &readfs,NULL,NULL,&time);
the dev_poll function in driver called once and return zero or POLLIN in order to buffer size . And then never called again. In user space , after 5 seconds the program continue if dev_poll returned 0.
What I couldn't understand is here , How the driver code will decide and let user space program if there is something in buffer that is readable withing this 5 seconds , if it's called once and returned immediately.
Is there anyway in kernel module to gather information of timeval parameter that comes from userspace ?
Thank you from now on.
Regards,
Call poll_wait() actually places some wait object into a waitqueue, specified as a second parameter. When wait object is fired (via waitqueue's wake_up or similar function), the poll function is evaluated again.
Kernel driver needn't to bother about timeouts: when time is out, the wait object will be removed from the waitqueue automatically.
Hello dear curious people like me about poll . I came up with a solution.
From another topic on stackowerflow a guy said that the poll_function is called multiple times if kernel need to last situation. So basically I implement that code .
when poll called call wait_poll(wait_queue_head);
when device have buffered data(this is usually in driver write function).
call wake_up macro with wait_queue_head paramater.
So after this step poll function of driver is called again .
So here you can return whatever you want to return. In this case POLLIN | POLLRDNORM..
Here is my sample code for write and poll in the driver.
static unsigned int dev_poll(struct file * file, poll_table *wait)
{
static int dev_poll_called_count = 0 ;
dev_poll_called_count ++;
poll_wait(file,&dev_wait,wait);
read_wait_queue_length++;
printk(KERN_INFO "Inside dev_poll called time is : %d read_wait_queue_length %d\n",dev_poll_called_count,read_wait_queue_length);
printk(KERN_INFO "After poll_wait wake_up called\n");
if (size_of_message > 0 ){
printk(KERN_INFO "size_of_message > 0 returning POLLIN | POLLRDNORM\n");
return POLLIN | POLLRDNORM;
}
else {
printk(KERN_INFO "dev_poll return 0\n");
return 0;
}
}
static ssize_t dev_write(struct file *filep, const char *buffer, size_t len, loff_t *offset){
printk(KERN_INFO "Inside write \n");;
int ret;
ret = copy_from_user(message, buffer, len);
size_of_message = len ;
printk(KERN_INFO "EBBChar: Received %zu characters from the user\n", size_of_message);
if (ret)
return -EFAULT;
message[len] = '\0';
printk(KERN_INFO "gelen string %s", message);
if (read_wait_queue_length)
{
wake_up(&dev_wait);
read_wait_queue_length = 0;
}
return len;
}

Flexible socket application

I'm writing a game wich playing on LAN with socket. I use 4 bytes length prefix to know how many data in the rest like this:
void trust_recv(int sock, int length, char *buffer)
{
int recved = 0;
int justRecv;
while(recved < length) {
justRecv = recv(sock, buffer + recved, length - recved, 0);
if (justRecv < 0) return;
recved += justRecv;
}
}
void onDataArrival(int sock)
{
int length;
char *data;
trust_recv(sock, 4, (char *) &length);
data = new char[length];
trust_recv(sock, length, data);
do_somethings_with_data(data);
}
The problem is if someone (an intruder or hacker for example) sends data with other format (maybe only 2 bytes or the length of the rest lower than 4 bytes prefix value) or an network problem, my application will be go to "not responding" state and have to close (because I use blocking socket). How to make my socket application more flexible but don't swith socket to non-blocking mode to pass this issue? (Or any ideas for organize data or algorithms as well)
You can set a receive timeout, during the socket setup phase, with setsockopt() call and SO_RCVTIMEO parameter;
struct timeval tv;
tv.tv_sec =8;
tv.tv_usec = 0 ;
if (setsockopt (your_sock_fd, SOL_SOCKET, SO_RCVTIMEO, (char *)&tv, sizeof tv)
perror(“setsockopt error”);
then test the return of recv() and his errno
if (justRecv < 0)
{
if (errno == EAGAIN)
perror("TIMEOUT!");
return;
}