How do I redefine built-in Perl functions? - perl

I want to do two things:
In production code, I want to redefine the open command to enable me to add automagic file logging. I work on data processing applications/flows and as part of that, it's important for the user to know exactly what files are being processed. If they are using an old version of a file, one way for them to find out is by reading through the list of files being processed.
I could just create a new sub that does this logging and returns a file pointer and use that in place of open in my code.
It would be really nice if I could just redefine open and have pre-existing code benefit from this behavior. Can I do this?
In debug code, I'd like to redefine the printf command to insert comments along with the written output indicating which code generated that line. Again, I have a sub that will optionally do this, but converting my existing code is tedious.

If a CORE subroutine has a prototype* it can be replaced. Replacing a function in the current namespace is simple enough.
#!/usr/bin/perl
use strict;
use warnings;
use subs 'chdir';
sub chdir(;$) {
my $dir = shift;
$dir = $ENV{HOME} unless defined $dir;
print "changing dir to $dir\n";
CORE::chdir $dir;
}
chdir("/tmp");
chdir;
If you want to override the function for all modules as well you can read the docs.
* Here is code to test every function in Perl 5.10 (it will work on earlier versions as well). Note, some functions can be overridden that this program will tell you can't be, but the overridden function will not behave in the same way as the original function.
from perldoc -f prototype
If the builtin is not overridable
(such as qw//) or if its arguments
cannot be adequately expressed by a
prototype (such as system),
prototype() returns undef, because the
builtin does not really behave like a
Perl function
#!/usr/bin/perl
use strict;
use warnings;
for my $func (map { split } <DATA>) {
my $proto;
#skip functions not in this version of Perl
next unless eval { $proto = prototype "CORE::$func"; 1 };
if ($proto) {
print "$func has a prototype of $proto\n";
} else {
print "$func cannot be overridden\n";
}
}
__DATA__
abs accept alarm atan2 bind
binmode bless break caller chdir
chmod chomp chop chown chr
chroot close closedir connect continue
cos crypt dbmclose defined delete
die do dump each endgrent
endhostent endnetent endprotoent endpwent endservent
eof eval exec exists exit
exp fcntl fileno flock fork
format formline getc getgrent getgrgid
getgrnam gethostbyaddr gethostbyname gethostent getlogin
getnetbyaddr getnetbyhost getnetent getpeername getpgrp
getppid getpriority getprotobyname getprotobynumber getprotoent
getpwent getpwnam getpwuid getservbyname getservbyport
getservent getsockname getsockopt glob gmtime
goto grep hex import index
int ioctl join keys kill
last lc lcfirst length link
listen local localtime lock log
lstat m map mkdir msgctl
msgget msgrcv msgsnd my next
no oct open opendir ord
our pack package pipe pop
pos print printf prototype push
q qq qr quotemeta qw
qx rand read readdir readline
readlink readpipe recv redo ref
rename require reset return reverse
rewinddir rindex rmdir s say
scalar seek seekdir select semctl
semget semop send setgrent sethostent
setnetent setpgrp setpriority setprotoent setpwent
setservent setsockopt shift shmctl shmget
shmread shmwrite shutdown sin sleep
socket socketpair sort splice split
sprintf sqrt srand stat state
study sub substr symlink syscall
sysopen sysread sysseek system syswrite
tell telldir tie tied time
times tr truncate uc ucfirst
umask undef unlink unpack unshift
untie use utime values vec
wait waitpid wantarray warn write
y -r -w -x -o
-R -W -X -O -e
-z -s -f -d -l
-p -S -b -c -t
-u -g -k -T -B
-M -A -C

For open: This worked for me.
use 5.010;
use strict;
use warnings;
use subs 'open';
use Symbol qw<geniosym>;
sub open (*$;#) {
say "Opening $_[-1]";
my ( $symb_arg ) = #_;
my $symb;
if ( defined $symb_arg ) {
no strict;
my $caller = caller();
$symb = \*{$symb_arg};
}
else {
$_[0] = geniosym;
}
given ( scalar #_ ) {
when ( 2 ) { return CORE::open( $symb // $_[0], $_[1] ); }
when ( 3 ) { return CORE::open( $symb // $_[0], $_[1], $_[2] ); }
}
return $symb;
}
open PERL4_FH, '<', 'D:\temp\TMP24FB.sql';
open my $lex_fh, '<', 'D:\temp\TMP24FB.sql';
For Printf: Did you check out this question? -> How can I hook into Perl’s print?

Related

What type is STDOUT, and how do I optionally write to it?

Does STDOUT have a "type"?
printf STDERR ("STDOUT = %s\n", STDOUT);
printf STDERR ("\*STDOUT = %s\n", *STDOUT);
printf STDERR ("\\\*STDOUT = %s\n", \*STDOUT);
Produces:
STDOUT = STDOUT
*STDOUT = *main::STDOUT
\*STDOUT = GLOB(0x600078848)
I understand the *main::STDOUT and GLOB(0x600078848) entries. The "bareword" one leaves me curious.
I'm asking because I want to pass a file handle-like argument to a method call. In 'C', I'd use a file descriptor or a File *. I want it to default to STDOUT. What I've done is:
$OUT_FILE_HANDLE = \*STDOUT;
if(#ARGV > 0 ) {
open($OUT_FILE_HANDLE, ">", "$ARGV[0]") or die $!;
}
It works, but I don't know exactly what I've done. Have I botched up STDOUT? I suspect I have "ruined" (overwritten) STDOUT, which is NOT what I want.
Please pardon the compound question; they seemed related.
Create a lexical filehandle to be a copy of STDOUT and manipulate that as needed
sub manip_fh {
my ($fh) = #_;
say $fh "hi"; # goes to STDOUT
open my $fh, '>', 'a_file.txt' or die $!; # now it's to a file
say $fh "hello";
}
open my $fh, '>&', STDOUT; # via dup2
manip_fh($fh);
say "hi"; # still goes where STDOUT went before being dup-ed (terminal)
This new, independent, filehandle can then be reopened to another resource without affecting STDOUT. See open.
The $OUT_FILE_HANDLE = \*STDOUT; from the question creates an alias and so the STDOUT does indeed get changed when the "new" one changes. You can see that by printing the typeglob
our $NEW = \*STDOUT; # "our" only for checks here, otherwise better "my"
say *{$main::NEW}; #--> *main::STDOUT
or by printing the IO slot from the symbol table for both
say for *{$main::NEW}{IO}, *{$main::{STDOUT}}{IO};
and seeing (that the object stringifies to) the same (eg IO::File=IO(0x1a8ca50)).
When it's duped using open with mode >& as in the first code snippet (but as global our) it prints *main::NEW, and its IO::File object is not the same as for STDOUT. (Make it a global our so that it is in the symbol table for these checks, but not for real use; it's much better having a my.)
From perlvar:
Perl identifiers that begin with digits or punctuation characters are exempt from the effects of the package declaration and are always forced to be in package main; they are also exempt from strict 'vars' errors. A few other names are also exempt in these ways: [...] STDOUT
So, STDOUT is a global variable containing a pre-opened file handle.
From perlfunc:
If FILEHANDLE is an undefined scalar variable (or array or hash element), a new filehandle is autovivified, meaning that the variable is assigned a reference to a newly allocated anonymous filehandle. Otherwise if FILEHANDLE is an expression, its value is the real filehandle.
Your $OUT_FILE_HANDLE is not undefined, so it is its value, STDOUT, that is being opened. AFAIK, if you open an already open handle, it is implicitly closed first.
There are several ways to do what you want. The first is obvious from the above quote — do not define $OUT_FILE_HANDLE before the open:
if (#ARGV > 0 ) {
open($OUT_FILE_HANDLE, ">", "$ARGV[0]") or die $!;
} else {
$OUT_FILE_HANDLE = \*STDOUT;
}
# do stuff to $OUT_FILE_HANDLE
Another is to use select, so you don't need to pass a file handle:
if (#ARGV > 0 ) {
open($OUT_FILE_HANDLE, ">", "$ARGV[0]") or die $!;
select $OUT_FILE_HANDLE;
}
# do stuff (without specifying a file handle)
select STDOUT;
This part of your question wasn't answered:
The "bareword" one leaves me curious.
An identifier with no other meaning is a string literal that produces itself.[1] For example, foo is the same as 'foo'.
$ perl -e'my $x = foo; print "$x\n";'
foo
This is error-prone, so we use use strict qw( subs ); to prevent this.
$ perl -e'use strict; my $x = foo; print "$x\n";'
Bareword "foo" not allowed while "strict subs" in use at -e line 1.
Execution of -e aborted due to compilation errors.
See this for other meanings Perl could assign.

Function definition changes the outward behavior of <>

IMPORTANT: The motivation for this question is not to solve a problem, but to understand Perl's behavior.
Consider the following toy script:
#!/usr/bin/env perl
use strict;
sub main {
#ARGV >= 2 or die "$0: not enough arguments\n";
my $arg_a = shift #ARGV;
my $arg_b = shift #ARGV;
while ( <> ) {
print "+++ $_";
}
}
main();
__END__
This script takes 2 or more arguments (which it doesn't use). All it does is to echo (with a +++ prefix) either its standard input or the contents of however many files are specified as its third, fourth, etc., arguments.
So far the code behaves as I expect it to.
Now consider this slightly modified version:
#!/usr/bin/env perl
use strict;
sub slurp {
local $/ = undef;
local #ARGV = #_;
return <>;
}
sub main {
#ARGV >= 2 or die "$0: not enough arguments\n";
my $arg_a = shift #ARGV;
my $arg_b = shift #ARGV;
my $content = slurp( $arg_a );
while ( <> ) {
print "+++ $_";
}
}
main();
__END__
This version of the script does not ignore its first argument; rather, it interprets as the path to a file, and reads its contents into the variable $content (which it subsequently ignores). Other than this, the script should behave exactly as before.
Unfortunately, this version of the script no longer echoes its stdin (though it still does echo the contents of its 3rd, 4th, etc. arguments).
I know that the problem has to do with the way the slurp function is implemented, because if I change this implementation to
sub slurp {
local $/ = undef;
open my $input, $_[ 0 ] or die "$!";
return <$input>;
}
...then the script once more echoes its stdin, when available.
I would like to understand why the first version of slurp causes the script to stop working as expected.
You need to exhaust the iterator (by calling it until it returns undef) before it considers using STDIN again.
sub slurp {
local $/ = undef;
local #ARGV = #_;
my $rv = <>; # Read file specified by $_[0].
1 while <>; # Exhaust the iterator.
return $rv;
}
or
sub slurp {
local $/ = undef;
local #ARGV = #_;
my $rv = "";
while (my $file = <>) {
$rv .= $file;
}
return $rv; # Concatenation of all files specified by #_.
}
For <> to work with STDIN it has to be called when #ARGV is empty. If there are filenames in #ARGV when <> is run they are removed from there as files are read, and then you'd need to call <> again in order to wait for the STDIN.
perl -wE'if (#ARGV) { print while <> }; print while <>' file
It is the second print while <> that waits on STDIN (without it file is printed and program exits).
That could in principle happen with your sub, if it were to read all files from #ARGV and once the control is back at the <> invocation in the main that one would then wait for STDIN.
However, your sub localizes the #ARGV (good practice!), so once it exits the global #ARGV still has what it did in the beginning.† Then while in the main reads those files (again), gets that one undef that it is due at the end of the last file, and exits.
One way to see this: remove all from #ARGV after the sub that reads input is called and before the while in main,. Then that while will wait for STDIN again, regardless of the sub. Like
perl -wE'
sub ri { local #ARGV = #_; return <> };
print for ri(#ARGV);
say"argv: #ARGV";
#ARGV=();
print while <>
' file
(A detail to note is that your example seems to take two files while the sub deals with one, so even if the sub were to use the global #ARGV (not local-ized) and remove a file from #ARGV, there'd still be one file left there to occupy the while in the main. So you still wouldn't get STDIN.)
Another way to see all this: add another print while <>, at the end; that one will wait on STDIN.
This is all described in I/O Operators (perlop), albeit it requires quite close reading.
† On local $GLOBAL_VAR; the value of $GLOBAL_VAR is copied away, and it gets restored as that scope is exited. So local protects the global variable from changes, within its scope.

Perl script giving un predictable results

I am very new to Perl. I wrote a script to display user name from Linux passwd file.
It displays list of user name but then it also display user ids (which I am not trying to display at the moment) and at the end it displays "List of users ids and names:" which it should display before displaying list of names.
Any idea why it is behaving like this?
#!/usr/bin/perl
#names=system("cat /etc/passwd | cut -f 1 -d :");
#ids=system("cat /etc/passwd | cut -f 3 -d :");
$length=#ids;
$i=0;
print "List of users ids and names:\n";
while ($i < $length) {
print $names[$i];
$i +=1;
}
Short answer: system doesn't return output from a command; it returns the exit value. As the output of the cut isn't redirected, it prints to the current STDOUT (e.g. your terminal). Use open or qx// quotes (aka backticks) to capture output:
#names = `cat /etc/passwd | cut -f 1 -d :`;
As you are still learning Perl, here is a write-up detailing how I'd solve that problem:
First, always use strict; use warnings; at the beginning of your script. This helps preventing and detecting many problems, which makes it an invaluable help.
Next, starting a shell when everything could be done inside Perl is inefficient (your solution starts six unneccessary processes (two sets of sh, cat, cut)). In fact, cat is useless even in the shell version; just use shell redirection operators: cut ... </etc/passwd.
To open a file in Perl, we'll do
use autodie; # automatic error handling
open my $passwd, "<", "/etc/passwd";
The "<" is the mode (here: reading). The $passwd variable now holds a file handle from which we can read lines like <$passwd>. The lines still contain a newline, so we'll chomp the variable (remove the line ending):
while (<$passwd>) { # <> operator reads into $_ by default
chomp; # defaults to $_
...
}
The split builtin takes a regex that matches separators, a string (defaults to $_ variable), and a optional limit. It returns a list of fields. To split a string with : seperator, we'll do
my #fields = split /:/;
The left hand side doesn't have to be an array, we can also supply a list of variables. This matches the list on the right, and assigns one element to each variable. If we want to skip a field, we name it undef:
my ($user, undef, $id) = split /:/;
Now we just want to print the user. We can use the print command for that:
print "$user\n";
From perl5 v10 on, we can use the say feature. This behaves exactly like print, but auto-appends a newline to the output:
say $user;
And voilà, we have our final script:
#!/usr/bin/perl
use strict; use warnings; use autodie; use feature 'say';
open my $passwd, "<", "/etc/passwd";
while (<$passwd>) {
chomp;
my ($user, undef, $id) = split /:/;
say $user;
}
Edit for antique perls
The autodie module was forst distributed as a core module with v10.1. Also, the feature 'say' isn't available before v10.
Therefore, we must use print instead of say and do manual error handling:
#!/usr/bin/perl
use strict; use warnings;
open my $passwd, "<", "/etc/passwd" or die "Can't open /etc/passwd: $!";
while (<$passwd>) {
chomp;
my ($user, undef, $id) = split /:/;
print "$user\n";
}
The open returns a false value when it fails. In that case, the $! variable will hold the reason for the error.
For reading of system databases you should use proper system functions:
use feature qw(say);
while (
my ($name, $passwd, $uid, $gid, $quota,
$comment, $gcos, $dir, $shell, $expire
)
= getpwent
)
{
say "$uid $name";
}
If you're scanning the entire password file, you can use getpwent():
while( my #pw = getpwent() ){
print "#pw\n";
}
See perldoc -f getpwent.

How to add -v switch without needing a switch for the filename using GetOpt?

I want to be able to run my script in one of two ways:
perl script.pl file
perl script.pl -v file
I know how to do
perl script.pl -v -f file
But I want to do it without needing the -f for file.
After using GetOpt, the remaining items in #ARGV are your positional parameters. You just need to use $ARGV[0] for "file".
use Getopt::Long;
my $verbose = 0;
my %opts = ( 'verbose' => \$verbose );
GetOptions(\%opts, 'verbose|v') or die "Incorrect options";
my $file = $ARGV[0];
die "You must provide a filename" unless length $file;
You can use Getopt::Long's argument callback:
use Getopt::Long;
my $file;
GetOptions(
'v' => \my $v,
'<>' => sub { $file = shift },
);
print "\$v: $v\n";
print "\$file: $file\n";
The command perl script.pl -v foo.txt outputs:
$v: 1
$file: foo.txt
Getopt::Long parses (by default) the items in #ARGV and removes these items one-by-one as it processes #ARGV. After Getoptions finishes, the first item remaining in #ARGV will be the file name:
use warnings;
use strict;
use Getopt::Long;
my $verbose;
GetOptions (
'v' => "\$verbose",
) or die qq(Invalid arguments passed);
my $file = shift; #Assuming a single file. Could be multiple
if ( $verbose ) {
print "Do something verbosely\n";
}
else {
print "Do it the normal way...\n";
}
Nothing special is needed. You allow GetOptions to handle the -v parameter if it exists, and you allow #ARGV to contain all of the parameters that are left after GetOptions finishes executing.
By the way, you could do this:
GetOptions (
'verbose' => "\$verbose",
) or die qq(Invalid arguments passed);
And you could use:
perl script.pl -v file
or
perl script.pl -verbose file
or
perl script.pl -verb file
Because, by default, GetOptions will auto_abbreviate the parameters and figure out what parameters the user is attempting to pass.
I highly recommend you look at the documentation and play around with it a bit. There will be a lot of stuff that won't quite make sense to you, but this is probably one of the earliest modules that new Perl programmers start to use, and it is full of all sorts of neat stuff.
And keep going back and reread the documentation as your skills develop because you'll find new stuff in this module as your understanding of Perl increases.

How to read to and write from a pipe in Perl?

I need to modify an existing Perl program. I want to pipe a string (which can contain multiple lines) through an external program and read the output from this program. This external program is used to modify the string. Let's simply use cat as a filter program. I tried it like this, but it doesn't work. (Output of cat goes to STDOUT instead of being read by perl.)
#!/usr/bin/perl
open(MESSAGE, "| cat |") or die("cat failed\n");
print MESSAGE "Line 1\nLine 2\n";
my $message = "";
while (<MESSAGE>)
{
$message .= $_;
}
close(MESSAGE);
print "This is the message: $message\n";
I've read that this isn't supported by Perl because it may end up in a deadlock, and I can understand it. But how do I do it then?
You can use IPC::Open3 to achieve bi-directional communication with child.
use strict;
use IPC::Open3;
my $pid = open3(\*CHLD_IN, \*CHLD_OUT, \*CHLD_ERR, 'cat')
or die "open3() failed $!";
my $r;
for(my $i=1;$i<10;$i++) {
print CHLD_IN "$i\n";
$r = <CHLD_OUT>;
print "Got $r from child\n";
}
This involves system programming, so it’s more than a basic question. As written, your main program doesn’t require full-duplex interaction with the external program. Dataflow travels in one direction, namely
string → external program → main program
Creating this pipeline is straightforward. Perl’s open has a useful mode explained in the “Safe pipe opens” section of the perlipc documentation.
Another interesting approach to interprocess communication is making your single program go multiprocess and communicate between—or even amongst—yourselves. The open function will accept a file argument of either "-|" or "|-" to do a very interesting thing: it forks a child connected to the filehandle you’ve opened. The child is running the same program as the parent. This is useful for safely opening a file when running under an assumed UID or GID, for example. If you open a pipe to minus, you can write to the filehandle you opened and your kid will find it in his STDIN. If you open a pipe from minus, you can read from the filehandle you opened whatever your kid writes to his STDOUT.
This is an open that involves a pipe, which gives nuance to the return value. The perlfunc documentation on open explains.
If you open a pipe on the command - (that is, specify either |- or -| with the one- or two-argument forms of open), an implicit fork is done, so open returns twice: in the parent process it returns the pid of the child process, and in the child process it returns (a defined) 0. Use defined($pid) or // to determine whether the open was successful.
To create the scaffolding, we work in right-to-left order using open to fork a new process at each step.
Your main program is already running.
Next, fork a process that will eventually become the external program.
Inside the process from step 2
First fork the string-printing process so as to make its output arrive on our STDIN.
Then exec the external program to perform its transformation.
Have the string-printer do its work and then exit, which kicks up to the next level.
Back in the main program, read the transformed result.
With all of that set up, all you have to do is implant your suggestion at the bottom, Mr. Cobb.
#! /usr/bin/env perl
use 5.10.0; # for defined-or and given/when
use strict;
use warnings;
my #transform = qw( tr [A-Za-z] [N-ZA-Mn-za-m] ); # rot13
my #inception = (
"V xabj, Qnq. Lbh jrer qvfnccbvagrq gung V pbhyqa'g or lbh.",
"V jnf qvfnccbvagrq gung lbh gevrq.",
);
sub snow_fortress { print map "$_\n", #inception }
sub hotel {
given (open(STDIN, "-|") // die "$0: fork: $!") { # / StackOverflow hiliter
snow_fortress when 0;
exec #transform or die "$0: exec: $!";
}
}
given (open(my $fh, "-|") // die "$0: fork: $!") {
hotel when 0;
print while <$fh>;
close $fh or warn "$0: close: $!";
}
Thanks for the opportunity to write such a fun program!
You can use the -n commandline switch to effectively wrap your existing program code in a while-loop... look at the man page for -n:
LINE:
while (<>) {
... # your program goes here
}
Then you can use the operating system's pipe mechanism directly
cat file | your_perl_prog.pl
(Edit)
I'll try to explain this more carefully...
The question is not clear about what part the perl program plays: filter or final stage. This works in either case, so I will assume it is the latter.
'your_perl_prog.pl' is your existing code. I'll call your filter program 'filter'.
Modify your_perl_prog.pl so that the shebang line has an added '-n' switch: #!/usr/bin/perl -n or #!/bin/env "perl -n"
This effectively puts a while(<>){} loop around the code in your_perl_prog.pl
add a BEGIN block to print the header:
BEGIN {print "HEADER LINE\n");}
You can read each line with '$line = <>;' and process/print
Then invoke the lot with
cat sourcefile |filter|your_perl_prog.pl
I want to expand on #Greg Bacon's answer without changing it.
I had to execute something similar, but wanted to code without
the given/when commands, and also found there was explicit exit()
calls missing because in the sample code it fell through and exited.
I also had to make it also work on a version running ActiveState perl,
but that version of perl does not work.
See this question How to read to and write from a pipe in perl with ActiveState Perl?
#! /usr/bin/env perl
use strict;
use warnings;
my $isActiveStatePerl = defined(&Win32::BuildNumber);
sub pipeFromFork
{
return open($_[0], "-|") if (!$isActiveStatePerl);
die "active state perl cannot cope with dup file handles after fork";
pipe $_[0], my $child or die "cannot create pipe";
my $pid = fork();
die "fork failed: $!" unless defined $pid;
if ($pid) { # parent
close $child;
} else { # child
open(STDOUT, ">&=", $child) or die "cannot clone child to STDOUT";
close $_[0];
}
return $pid;
}
my #transform = qw( tr [A-Za-z] [N-ZA-Mn-za-m] ); # rot13
my #inception = (
"V xabj, Qnq. Lbh jrer qvfnccbvagrq gung V pbhyqa'g or lbh.",
"V jnf qvfnccbvagrq gung lbh gevrq.",
);
sub snow_fortress { print map "$_\n", #inception }
sub hotel
{
my $fh;
my $pid = pipeFromFork($fh); # my $pid = open STDIN, "-|";
defined($pid) or die "$0: fork: $!";
if (0 == $pid) {
snow_fortress;
exit(0);
}
open(STDIN, "<&", $fh) or die "cannot clone to STDIN";
exec #transform or die "$0: exec: $!";
}
my $fh;
my $pid = pipeFromFork($fh); # my $pid = open my $fh, "-|";
defined($pid) or die "$0: fork: $!";
if (0 == $pid) {
hotel;
exit(0);
}
print while <$fh>;
close $fh or warn "$0: close: $!";
the simplest -- not involving all these cool internals -- way to do what the OP needs, is to use a temporary file to hold the output until the external processor is done, like so:
open ToTemp, "|/usr/bin/tac>/tmp/MyTmp$$.whee" or die "open the tool: $!";
print ToTemp $TheMessageWhateverItIs;
close ToTemp;
my $Result = `cat /tmp/MyTmp$$.whee`; # or open and read it, or use File::Slurp, etc
unlink "/tmp/MyTmp$$.whee";
Of course, this isn't going to work for something interactive, but co-routines appear to be out of the scope of the original question.