Possible to see tracing when using cat or vi opening a text file - bpf

Is it possible to trace through what is being read through a text file using eBPF? There are ways to see the amount of memory being used and count reads and writes but I would like to even output the user data using bpf_trace_print if possible.

I think this would require tracing open() (or openat()) system call and correlate it (fd in particular) with traced read calls.
/sys/kernel/debug/tracing/events/syscalls/sys_enter_read/format defines what syscall arguments can be accessed. What may interest you is char *buf buffer pointer, where read() places bytes it has read.
However, it is possible that the trace call occurs before any bytes have been read (need to check the kernel source). So, may be more reliable way is to use raw tracepoint (BPF_PROG_TYPE_RAW_TRACEPOINT) hooked at read() return.

Related

Common Lisp: flush standard output

Trying to learn lisp (and I guess emacs along with it).
I was wondering how you would go about clearing the output and replacing it.
Could be in a LISP repl, or an emacs buffer.
Something akin to the following in python.
def go(r):
for i in range(r):
sys.stdout.write("\rDoing %i" % i)
sys.stdout.flush()
For common lisp, you are looking for
Functions FINISH-OUTPUT, FORCE-OUTPUT, CLEAR-OUTPUT:
finish-output, force-output, and clear-output exercise control over the internal handling of buffered stream output.
finish-output attempts to ensure that any buffered output sent to output-stream has reached its destination, and then returns.
force-output initiates the emptying of any internal buffers but does not wait for completion or acknowledgment to return.
clear-output attempts to abort any outstanding output operation in progress in order to allow as little output as possible to continue to the destination.
and
Variables *DEBUG-IO*, *ERROR-OUTPUT*, *QUERY-IO*, *STANDARD-INPUT*, *STANDARD-OUTPUT*, *TRACE-OUTPUT*
The value of *debug-io*, called debug I/O, is a stream to be used for interactive debugging purposes.
The value of *error-output*, called error output, is a stream to which warnings and non-interactive error messages should be sent.
The value of *query-io*, called query I/O, is a bidirectional stream to be used when asking questions of the user. The question should be output to this stream, and the answer read from it.
The value of *standard-input*, called standard input, is a stream that is used by many operators as a default source of input when no specific input stream is explicitly supplied.
The value of *standard-output*, called standard output, is a stream that is used by many operators as a default destination for output when no specific output stream is explicitly supplied.
The value of *trace-output*, called trace output, is the stream on which traced functions (see trace) and the time macro print their output.
Emacs Lisp is quite different, you might want to start here: https://www.gnu.org/software/emacs/manual/html_node/elisp/Output-Functions.html

Matlab read from fifo with fopen timeout

I'm working with named pipes (fifo) to communicate between python and MATLAB. The MATLAB code to read from the pipe is functional but it hangs if nothing has been written to the fifo. I would prefer it would gracefully timeout when no data is available.
If the pipe exists (in bash):
$ mkfifo pipe_read
but has no data the MATLAB open command:
>> fid = fopen('pipe_read', 'r');
hangs until data is available:
$ echo "test data" >> pipe_read
Rather than blocking forever I would like fopen to return a file-id that indicates an error (i.e. similar to -1 as it does when the file does not exist) if there is no data available.
Could there be a solution similar to the asynchronous reads available in the commands for writing and reading to serial instruments: http://www.mathworks.com/help/matlab/ref/readasync.html ?
Or possibly fopen could be embedded into a matlab timer object that enables a timeout?
This has been asked before but without an answer:
Matlab read from named pipe (fifo)
I'm pretty sure the issue is not actually with Matlab's fopen, but the underlying open system call. Generally, the use of a pipe or FIFO only makes sense when there exists both a reader and a writer, and so, by default, open(2) will block until the other end of the FIFO has been opened as well.
I don't think it will work to embed the fopen call in any other Matlab object. As far as I'm aware, the only way to circumvent this is to write your own version of fopen, as a specialized Mex function. In this case, you can make a call to open(2) with the O_NONBLOCK flag or'd with whatever read/write flag you'd like. But digging around in man 2 open, under the ERRORS section, you can see that ENXIO is returned if "O_NONBLOCK and O_WRONLY are set, the file is a FIFO, and no process has it open for reading". That means you need to make sure that Python has opened the FIFO for reading before Matlab tries to open for writing (or vice versa).
As a final point, keep in mind that Matlab's fopen returns a handle to a file descriptor. Your Mex function should probably mirror that, so you can pass it around to fread/fscanf/etc without issues.
In Linux, a system call with timeout would do the trick. For example:
timeout = 5; % timeout in seconds
pipe = 'pipe_read';
[exit_code,str] = system(sprintf('timeout %ds cat %s', timeout, pipe));
switch(exit_code);
case 0; doSomething(str); % found data
case 124; doTimeout(); % timedout
end
MacOS has gtimeout which I assume is similar.

Why aren't buffers auto-flushed by default?

I recently had the privilege of setting $| = 1; inside my Perl script to help it talk faster with another application across a pipe.
I'm curious as to why this is not the default setting. In other words, what do I lose out on if my buffer gets flushed straightaway?
Writing to a file descriptor is done via system calls, and system calls are slow.
Buffering a stream and flushing it only once some amount of data has been written is a way to save some system calls.
Benchmark it and you will understand.
Buffered depends on the device type of the output handle: ttys are line-buffered; pipes and sockets are pipe-buffered; disks are block-buffered.
This is just basic programming. It’s not a Perl thing.
The fewer times the I/O buffer is flushed, the faster your code is in general (since it doesn't have to make a system call as often). So your code spends more time waiting for I/O by enabling auto-flush.
In a purely network I/O-driven application, that obviously makes more sense. However, in the most common use cases, line-buffered I/O (Perl's default for TTYs) allows the program to flush the buffer less often and spend more time doing CPU work. The average user wouldn't notice the difference on a terminal or in a file.

dd speed issue using ibs/obs

I have a loop where I use dd to copy a stream to a disk. I am using a larger blocksize using 'bs' in the entire process for speed reasons. However with one specific line I have to use 'ibs' and 'obs' because my 'seek' location is not a multiple of the 'bs' I use elsewhere.
My question is: Is there a way using dd or any other program/Perl module to write out a blocksize different from the one used to 'seek'?
dd if=/dev/ram1 of=/dev/sdb1 seek=2469396480 ibs=1048576 obs=1 count=1
As you can see above, while the raw data is read in a 1M block I have to write it out in 1 byte segments because I need to seek to a specific location based on a byte granularity. This makes the write 1/100th as fast.
Is there a workaround? Or is there a way to do this in Perl without using dd?
Thanks,
Nick
This problem is inherent in dd. If your desired seek location has no factor of suitable magnitude (big enough for good performance but small enough to use as a buffer size) then you're stuck. This happens among other times when your desired seek location is a large prime.
In this specific case, as Mark Mann pointed out, you do have good options: 2469396480 is 2355 blocks of size 1048576, or 1024 blocks of size 2411520, etc... But that's not a generic answer.
To do this generically, you'll want to use something other than dd. Fortunately, dd's task is really simple and all you need is the following (in pseudocode... I haven't done much Perl in a while)
if = open("/dev/ram1", "r")
of = open("/dev/sdb1", "r+")
seek(of, 2469396480)
loop until you have copied the amount of data you want {
chunk = read(if, min(chunksize, remaining_bytes_to_copy))
write(of, chunk)
}
It looks like the source of your copy is a ramdisk of some sort. If you really want screaming performance, you might try another method besides reading chunks into a buffer and writing the buffer out to the output file. For example you can mmap() the source file and write() directly from the mapped address. The OS may (or may not) optimize away one of the RAM-to-RAM copy operations. Note that such methods will be less portable and less likely to be available in high level languages like Perl.

Why does writing to an unconnected socket send SIGPIPE first?

There are so many possible errors in the POSIX environment. Why do some of them (like writing to an unconnected socket in particular) get special treatment in the form of signals?
This is by design, so that simple programs producing text (e.g. find, grep, cat) used in a pipeline would die when their consumer dies. That is, if you're running a chain like find | grep | sed | head, head will exit as soon as it reads enough lines. That will kill sed with SIGPIPE, which will kill grep with SIGPIPE, which will kill find with SEGPIPE. If there were no SIGPIPE, naively written programs would continue running and producing content that nobody needs.
If you don't want to get SIGPIPE in your program, just ignore it with a call to signal(). After that, syscalls like write() that hit a broken pipe will return with errno=EPIPE instead.
See this SO answer for a detailed explanation of why writing a closed descriptor / socket generates SIGPIPE.
Why is writing a closed TCP socket worse than reading one?
SIGPIPE isn't specific to sockets — as the name would suggest, it is also sent when you try to write to a pipe (anonymous or named) as well. I guess the reason for having separate error-handling behaviour is that broken pipes shouldn't always be treated as an error (whereas, for example, trying to write to a file that doesn't exist should always be treated as an error).
Consider the program less. This program reads input from stdin (unless a filename is specified) and only shows part of it at a time. If the user scrolls down, it will try to read more input from stdin, and display that. Since it doesn't read all the input at once, the pipe will be broken if the user quits (e.g. by pressing q) before the input has all been read. This isn't really a problem, though, so the program that's writing down the pipe should handle it gracefully.
it's up to the design.
at the beginning people use signal to control events notification which were sent to the user space, and later it is not necessary because there're more popular skeletons such as polling which don't require a system caller to make a signal handler.