How can I record changes made during in-place editing in Perl? - perl

I've scripted up a simple ksh that calls a Perl program to find and replace in files.
The passed-in arg is the home directory:
perl -pi -e 's/find/replace/g' $1/*.html
It works great. However, I'd like to output all the changes to a log file. I've tried piping and redirecting and haven't been able to get it work. Any ideas?
Thanks,
Glenn

Something like this to send all changes to STDERR:
perl -pi -e '$old = $_; s/find/replace/g and warn "${ARGV}[$.]: $old $_"; close ARGV if eof' $1/*.html
Updated: Fixed $. on multiple files.

You can print to STDERR and redirect just the STDERR output to a file as below:
perl -pi -e 'chomp($prev=$_);s/find/replace/g and print STDERR "$ARGV - $.: $prev -> $_"; close ARGV if eof' $1/*.html 2> logfile.txt
edit: added the filename, and fixed line number display when multiple input files are used

Related

Perl backticks subprocess is causing EOF on STDIN

I'm having this issue with my perl program that is reading from a file (which I open on STDIN and read each line one at a time using $line = <>). After I execute a `backtick` command, and then I go to read the next line from STDIN, I get an undef, signaling EOF. I isolated it to the backtick command using debugging code as follows:
my $dir = dirname(__FILE__);
say STDERR "before: tell(STDIN)=" . tell(STDIN) . ", eof(STDIN)=" . eof(STDIN);
say STDERR "\#export_info = `echo nostdin | perl $dir/pythonizer_importer.pl $fullfile`;";
#export_info = `echo nostdin | perl $dir/pythonizer_importer.pl $fullfile`;
say STDERR "after: tell(STDIN)=" . tell(STDIN) . ", eof(STDIN)=" . eof(STDIN);
The output is:
before: tell(STDIN)=15146, eof(STDIN)=
#export_info = `echo nostdin | perl ../pythonizer_importer.pl ./Pscan.pm`;
after: tell(STDIN)=15146, eof(STDIN)=1
I recently added the echo nostdin | to the perl command which had no effect. How do I run this command and get the STDOUT without messing up my STDIN? BTW, this is all running on Windows. I fire off the main program from a git bash if that matters.
Try locally undefining STDIN before running the backticks command, like this example script does. Note that any subroutines called from the sub that calls local will see the new value. You can also do open STDIN, "<", "file for child process to read"; after the local *STDIN but before the backticks but remember to close() the file before restoring STDIN to its old value.
The child process is affecting your STDIN because "the STDIN filehandle used by the command is inherited from Perl's STDIN." – perlop manual
This is just an example; in your actual script, replace the sed command with your actual command to run.
use strict;
use warnings;
#Run a command and get its output
sub get_output {
# Prevent passing our STDIN to child process
local *STDIN = undef;
print "Running sed\n";
#replace the sed command with the actual command you want to run
return `sed 's/a/b/'`;
}
my $output = get_output();
print $output;
#We can still read STDIN even after running a child process
print "Waiting for input\n";
print "Readline is " . scalar readline;
Input:
a
b
c
^D
line
Output:
Running sed
b
b
c
Waiting for input
Readline is line

Perl STDERR printed in the wrong order with Tee

I'm trying to redirect STDOUT and STDERR from a perl script - executed from a bash script - to both screen and log file.
perlscript.pl
#!/usr/bin/perl
print "This is a standard output";
print "This is a second standard output";
print STDERR "This is an error";
bashscript.sh
#!/bin/bash
./perlscript.pl 2>&1 | tee -a logfile.log
If I execute the perlscript directly the screen output is printed in the correct order :
This is a standard output
This is a second standard output
This is an error
But when I execute the bash script the STDERR is printed first (in both screen and file) :
This is an error
This is a standard output
This is a second standard output
With a bash script as child the output is ordered flawlessly. Is it a bug with perl or tee? Am I doing something wrong?
An usual trick to turnoff buffering is to set the variable $|. Add the below line at
beginning of your script.
$| = 1;
This would turn the buffering off. Also refer to this excellent article by MJD explaining buffering in perl. Suffering from Buffering?
I guess this has to do with the way STDOUT and STDERR buffers are flushed. Try
autoflush STDOUT 1;
at the beginning of your perl script so that STDOUT is flushed after each print statement.

How read/write into a named pipe in perl?

I have a script which have their input/output plugged to named pipes. I try to write something to the first named pipe and to read the result from the second named pipe but nothing happen.
I used open then open2 then sysopen whithout success :
sysopen(FH, "/home/Moses/enfr_kiid5/pipe_CGI_Uniform", O_RDWR);
sysopen(FH2, "/home/Moses/enfr_kiid5/pipe_Detoken_CGI", O_RDWR);
print FH "test 4242 test 4242" or die "error print";
doesn't made error but didn't work : i can't see trace of the print, the test sentence is not write into the first named pipe and try to read from the second block the process.
Works here.
$ mkfifo pipe
$ cat pipe &
$ perl -e 'open my $f, ">", "pipe"; print $f "test\n"'
test
$ rm pipe
You don't really need fancy sysopen stuff, named pipes are really supposed to behave like regular files, albeit half-duplex. Which happens to be a difference between your code and mine, worth investigating if you really need this opening pattern.
You may need to unbuffer your output after opening the pipe:
sysopen(...);
sysopen(...);
$old=select FH;
$|=1;
select $old;
print FH...
And, as friedo says, add a carriage return ("\n") to the end of your print statement!

Perl one-liner: how to reference the filename passed in when -ne or -pe commandline switches are used

In Perl, it's normally easy enough to get a reference to the commandline arguments. I just use $ARGV[0] for example to get the name of a file that was passed in as the first argument.
When using a Perl one-liner, however, it seems to no longer work. For example, here I want to print the name of the file that I'm iterating through if a certain string is found within it:
perl -ne 'print $ARGV[0] if(/needle/)' haystack.txt
This doesn't work, because ARGV doesn't get populated when the -n or -p switch is used. Is there a way around this?
What you are looking for is $ARGV. Quote from perlvar:
$ARGV
Contains the name of the current file when reading from <> .
So, your one-liner would become:
perl -ne 'print $ARGV if(/needle/)' haystack.txt
Though be aware that it will print once for each match. If you want a newline added to the print, you can use the -l option.
perl -lne 'print $ARGV if(/needle/)' haystack.txt
If you want it to print only once for each match, you can close the ARGV file handle and make it skip to the next file:
perl -lne 'if (/needle/) { print $ARGV; close ARGV }' haystack.txt haystack2.txt
As Peter Mortensen points out, $ARGV and $ARGV[0] are two different variables. $ARGV[0] refers to the first element of the array #ARGV, whereas $ARGV is a scalar which is a completely different variable.
You say that #ARGV is not populated when using the -p or -n switch, which is not true. The code that runs silently is something like:
while (#ARGV) {
$ARGV = shift #ARGV; # arguments are removed during runtime
open ARGV, $ARGV or die $!;
while (defined($_ = <ARGV>)) { # long version of: while (<>) {
# your code goes here
} continue { # when using the -p switch
print $_; # it includes a print statement
}
}
Which in essence means that using $ARGV[0] will never show the real file name, because it is removed before it is accessed, and placed in $ARGV.

How can I print just a unix newline in Perl on Win32?

By default, perl prints \r\n in a win32 environment. How can I override this? I'm using perl to make some changes to some source code in a repository, and I don't want to change all the newline characters.
I tried changing the output record separator but with no luck.
Thanks!
Edit: Wanted to include a code sample - I'm doing a search and replace over some files that follow a relatively straightforward pattern like this:
#!/usr/bin/perl
# test.pl
use strict;
use warnings;
$/ = undef;
$\ = "\n";
$^I=".old~";
while (<>) {
while (s/hello/world/) {
}
print;
}
This should replace any instances of "hello" with "world" for any files passed on the cmd line.
Edit 2: I tried the binmode as suggested without any luck initially. I delved a bit more and found that $^I (the inplace edit special variable) was overriding binmode. Any work around to still be able to use the inplace edit?
Edit 3: As Sinan points out below, I needed to use binmode ARGVOUT with $^I instead of binmode STDOUT in my example. Thanks.
Printing "\n" to a filehandle on Windows emits, by default, a CARRIAGE RETURN ("\015") followed by a LINE FEED ("\012") character because that the standard newline sequence on Windows.
This happens transparently, so you need to override it for the special filehandle ARGVOUT (see perldoc perlvar):
#!/usr/bin/perl -i.bak
use strict; use warnings;
local ($\, $/);
while (<>) {
binmode ARGVOUT;
print;
}
Output:
C:\Temp> xxd test.txt
0000000: 7465 7374 0d0a 0d0a test....
C:\Temp> h test.txt
C:\Temp> xxd test.txt
0000000: 7465 7374 0a0a test..
See also perldoc open, perldoc binmode and perldoc perliol (thanks daotoad).
Does binmode( STDOUT ) work?
Re: your question about the binmode being lost when $^I opens a new output handle, you could solve this with the open pragma:
use open OUT => ':raw';
which will force all filehandles opened for writing to have the ':raw' PerlIO layer (equivalent to binmode with no argument) to apply to them. Just take care if you're opening anything else for output that you apply :crlf or any other layer as needed.
The data you are reading in contains line endings, so you're getting them back out again. You can strip them off yourself with chomp, then add your own ending back, provided you have set binmode as Sinan describes::
while (<>) {
binmode;
chomp; # strip off \r\n
while (s/search/replace/) {
# ...
}
print;
print "\n"; # add your own line ending back
}
By default, perl prints \r\n in a win32 environment. How can I override this?
I ended up creating my own file and setting binmode(fh) specifically. I could not get STDOUT (or ARGVOUT) to work reliably under both Windows 10 using perl 5.8.8 and Windows 7 with perl 5.14.4.
perl -e 'open(fh, ">x"); binmode(fh); print fh "\n";' ; od -c x
0000000 \n
Sometimes the binmode(fh) was needed here and sometimes it seemed to be the default.
I could not get binmode(STDOUT) to be work reliably. Some of the following did output just \n under Windows:
perl -e 'binmode(ARGVOUT); print "\n";' | od -c
perl -e 'binmode(STDOUT); print "\n";' | od -c
perl -e 'binmode(STDOUT); syswrite(STDOUT, "\n");' | od -c
... but then not when the output was going to a file. The following still spat out \r \n.
perl -e 'binmode(STDOUT); print "\n";' > x ; od -c x
perl -e 'binmode(ARGVOUT); print "\n";' > x ; od -c x
Interestingly, the following worked when piping to cat which then writes to a file. Perl must be seeing if STDOUT is a terminal, file, or pipe and enabling the cr-lf layer or not. Why a pipe is binary but a file is not is an interesting decision. There are also differences between running perl interactively from the command-line and running it from a script with the same args and redirects.
perl -e 'binmode(STDOUT); print "\n";' | cat > x ; od -c x
Noticed that I tried print and syswrite. I was surprised that syswrite didn't give me a direct layer to the file-handle. I also tried to copy the STDOUT file-handle and set binmode on that new file-handle but that didn't work either. PERLIO environmental variable didn't help either. The use out => ":raw"; worked under Windows 10 perl 5.8.8 but not Windows 7 perl 5.14.4 when redirected to an output file.
Btw, I wasn't doing a print "\n"; in my code when I stumbled over this problem. I was doing a print of pack("c", $num); where $num happened to be 10. Imagine my surprise when my binary file was corrupted by \rs.
Porting sucks!
A unix newline is a LINEFEED character, which is ASCII code 10.
print "\012";