Why aren't buffers auto-flushed by default? - perl

I recently had the privilege of setting $| = 1; inside my Perl script to help it talk faster with another application across a pipe.
I'm curious as to why this is not the default setting. In other words, what do I lose out on if my buffer gets flushed straightaway?

Writing to a file descriptor is done via system calls, and system calls are slow.
Buffering a stream and flushing it only once some amount of data has been written is a way to save some system calls.

Benchmark it and you will understand.
Buffered depends on the device type of the output handle: ttys are line-buffered; pipes and sockets are pipe-buffered; disks are block-buffered.
This is just basic programming. It’s not a Perl thing.

The fewer times the I/O buffer is flushed, the faster your code is in general (since it doesn't have to make a system call as often). So your code spends more time waiting for I/O by enabling auto-flush.
In a purely network I/O-driven application, that obviously makes more sense. However, in the most common use cases, line-buffered I/O (Perl's default for TTYs) allows the program to flush the buffer less often and spend more time doing CPU work. The average user wouldn't notice the difference on a terminal or in a file.

Related

Stop execution when RAM is filled (i.e. avoid writing to Disk)

I have this problem:
I run some large calculations before going to sleep (or work).
When I return sometimes RAM is already filled and the program starts writing to Disk, which is a problem since then computer becomes almost non responsive, also the button "Interrupt the current operation" doesn't stop mserver.exe from executing a task.
This is what I saw 10 mins after I pressed the button "Interrupt the current operation":
Not to mention that calculations are probably like 100 or even 1000 times slower when it starts using Disk instead of RAM (so it's pointless anyway).
Another problem is that I was unable to save some variables to file since in Maple I couldn't type anything while mserver.exe was executing a task and after I killed the process mserver.exe I was still unable to save those variables since Maple commands don't work when connection to kernel is lost.
So, my question: can I make it so that mserver.exe won't use Disk at all (I mean from Maple alone, not by disabling page file in Windows) and just stop execution automatically when RAM is full (just like Classic Maple does when it hits 2GB limit)?
Also it would be nice to be able to limit Maple from using processor too much, for example up to 75% or so, so that I could work on that computer without problems.
You might experiment with a few of the options available for specifying limits on the Maple (kernel, mserver) engine.
In particular,
--init-reserve-mem=memorysize
(or, possible, the -T option). See here for some more detail:
https://www.maplesoft.com/support/help/MapleSim/view.aspx?path=maple
On Linux/OSX you could pass that in a call to the maple script that launches Maple. On MS-Windows you could add that to the command string/Property in the launcher (icon).
You might try setting it to a fraction of your total RAM, eg. 50-75%, and see how it goes. Presumably you'll have some other processes running.
As far as restricting the CPU use goes, that's more of a OS issue. On Linux/OSX you could use the system's nice facility. I don't know what's avaliable on MS-Windows (built-in or 3rd party). You might be able to set the Priority of the running mserver process from the Task Manager. Or you might look at something like the START facility:
https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/start

Can I control process priorities through perl

I am wondering if it is possible to control the priorities through perl.
Basically I want my perl script to keep running in my box if some process take up the cpu. This perl script either reduce the priority or if process is too much CPU taking, perl script can kill that too.
I hate to be operating System specific, But I am trying to design this for Windows system.
You can use getpriority and setpriority to handle priorities in Perl.
From POSIX::nice():
This is similar to the C function nice() , for changing the scheduling preference of the current process. Positive arguments mean more polite process, negative values more needy process. Normal user processes can only be more polite. Returns undef on failure.

Process communication with signals

I was programming in C doing system calls, and I was wondering the following:
What's an example of where you'd want a process to ignore alarm signals, say if the signal was sent because of a lost packet in intra-network processes?
Many important daemons are very picky about the signals they will respond to; they often install a handler for SIGHUP to re-read their configuration file, use one of SIGUSR1 or SIGUSR2 to indicate the need to close and re-open their log files for log-rotation, and handle SIGINT, SIGQUIT, SIGTERM, etc., in some sort of graceful way.
Everything else should be ignored so that accidental signals do not cause the program to do funny things. The signals that are part of the program's interface should work exactly as designed -- and the other signals should do as little harm as possible.

Is it bad to open() and close() in a loop if speed isn't an issue?

I modified another programmer's Perl script I use to make it output logs. The perl script goes through files, and for every file it goes through I open() the log, write/print to it and then close() it. This happens a lot of times. I do this to make sure I don't lose any data if said Perl script hangs up (it eventually starts doing that, and I'm not knowledgeable enough to fix it). Therefore, I don't have a good alternative to repeating open() and close() in that loop.
My main question is this: the Perl script is for personal use, so speed reduction is not an issue. But are there other bad things that could follow out of this likely improper usage of open/close? It may sound like a stupid question, but is it possible this would wear my hard disk down faster, or am I misunderstanding how file handling works?
Thanks in advance.
As others have mentioned, there is no issue here other than performance (and arguably cleanliness of code).
However, if you are merely worried about "losing data if Perl hangs up", just set autoflush on the file handle:
use IO::Handle;
open HANDLE, '>log.txt'
or die "Unable to open log.txt for writing: $!";
HANDLE->autoflush(1);
Now every print to HANDLE will get flushed automatically. No need to keep opening and closing.
Search for "autoflush" in the perldoc man page for more information.
In theory it's usually better to open and close connections as quickly as possible, and files are no different. The two things you will run into are file locking and performance.
File locking could come about if something else is accessing your file at the same time.
Performance, as you mentioned, isn't a huge concern.
We're not talking about lifetimes of waiting for open/close operations anyway...it's mostly noticeable with high concurrency or hundreds of thousands of actions.
The OS determines hard drive access so you should be fine. If you need to open() and close() a lot of files then it's ok. The only thing that might happen is if your script hangs (for some odd reason) while it has the file pointer from open() it could cause data loss if it resumes after you edit manually (but this is a pretty rare scenario). Also if your script crashes, then the descriptors get released anyway so there's no issue as far as I can tell.

How can I debug a Perl program that suddenly exits?

I have Perl program based on IO::Async, and it sometimes just exits after a few hours/days without printing any error message whatsoever. There's nothing in dmesg or /var/log either. STDOUT/STDERR are both autoflush(1) so data shouldn't be lost in buffers. It doesn't actually exit from IO::Async::Loop->loop_forever - print I put there just to make sure of that never gets triggered.
Now one way would be to keep peppering the program with more and more prints and hope one of them gives me some clue. Is there better way to get information what was going on in a program that made it exit/silently crash?
One trick I've used is to run the program under strace or ltrace (or attach to the process using strace). Naturally that was under Linux. Under other operating systems you'd use ktrace or dtrace or whatever is appropriate.
A trick I've used for programs which only exhibit sparse issues over days or week and then only over handfuls among hundreds of systems is to direct the output from my tracer to a FIFO, and have a custom program keep only 10K lines in a ring buffer (and with a handler on SIGPIPE and SIGHUP to dump the current buffer contents into a file. (It's a simple program, but I don't have a copy handy and I'm not going to re-write it tonight; my copy was written for internal use and is owned by a former employer).
The ring buffer allows the program to run indefinitely with fear of running systems out of disk space ... we usually only need a few hundred, even a couple thousand lines of the trace in such matters.
If you are capturing STDERR, you could start the program as perl -MCarp::Always foo_prog. Carp::Always forces a stack trace on all errors.
A sudden exit without any error message is possibly a SIGPIPE. Traditionally SIGPIPE is used to stop things like the cat command in the following pipeline:
cat file | head -10
It doesn't usually result in anything being printed either by libc or perl to indicate what happened.
Since in an IO::Async-based program you'd not want to silently exit on SIGPIPE, my suggestion would be to put somewhere in the main file of the program a line something like
$SIG{PIPE} = sub { die "Aborting on SIGPIPE\n" };
which will at least alert you to this fact. If instead you use Carp::croak without the \n you might even be lucky enough to get the file/line number of the syswrite, etc... that caused the SIGPIPE.