IMPORTANT: The motivation for this question is not to solve a problem, but to understand Perl's behavior.
Consider the following toy script:
#!/usr/bin/env perl
use strict;
sub main {
#ARGV >= 2 or die "$0: not enough arguments\n";
my $arg_a = shift #ARGV;
my $arg_b = shift #ARGV;
while ( <> ) {
print "+++ $_";
}
}
main();
__END__
This script takes 2 or more arguments (which it doesn't use). All it does is to echo (with a +++ prefix) either its standard input or the contents of however many files are specified as its third, fourth, etc., arguments.
So far the code behaves as I expect it to.
Now consider this slightly modified version:
#!/usr/bin/env perl
use strict;
sub slurp {
local $/ = undef;
local #ARGV = #_;
return <>;
}
sub main {
#ARGV >= 2 or die "$0: not enough arguments\n";
my $arg_a = shift #ARGV;
my $arg_b = shift #ARGV;
my $content = slurp( $arg_a );
while ( <> ) {
print "+++ $_";
}
}
main();
__END__
This version of the script does not ignore its first argument; rather, it interprets as the path to a file, and reads its contents into the variable $content (which it subsequently ignores). Other than this, the script should behave exactly as before.
Unfortunately, this version of the script no longer echoes its stdin (though it still does echo the contents of its 3rd, 4th, etc. arguments).
I know that the problem has to do with the way the slurp function is implemented, because if I change this implementation to
sub slurp {
local $/ = undef;
open my $input, $_[ 0 ] or die "$!";
return <$input>;
}
...then the script once more echoes its stdin, when available.
I would like to understand why the first version of slurp causes the script to stop working as expected.
You need to exhaust the iterator (by calling it until it returns undef) before it considers using STDIN again.
sub slurp {
local $/ = undef;
local #ARGV = #_;
my $rv = <>; # Read file specified by $_[0].
1 while <>; # Exhaust the iterator.
return $rv;
}
or
sub slurp {
local $/ = undef;
local #ARGV = #_;
my $rv = "";
while (my $file = <>) {
$rv .= $file;
}
return $rv; # Concatenation of all files specified by #_.
}
For <> to work with STDIN it has to be called when #ARGV is empty. If there are filenames in #ARGV when <> is run they are removed from there as files are read, and then you'd need to call <> again in order to wait for the STDIN.
perl -wE'if (#ARGV) { print while <> }; print while <>' file
It is the second print while <> that waits on STDIN (without it file is printed and program exits).
That could in principle happen with your sub, if it were to read all files from #ARGV and once the control is back at the <> invocation in the main that one would then wait for STDIN.
However, your sub localizes the #ARGV (good practice!), so once it exits the global #ARGV still has what it did in the beginning.† Then while in the main reads those files (again), gets that one undef that it is due at the end of the last file, and exits.
One way to see this: remove all from #ARGV after the sub that reads input is called and before the while in main,. Then that while will wait for STDIN again, regardless of the sub. Like
perl -wE'
sub ri { local #ARGV = #_; return <> };
print for ri(#ARGV);
say"argv: #ARGV";
#ARGV=();
print while <>
' file
(A detail to note is that your example seems to take two files while the sub deals with one, so even if the sub were to use the global #ARGV (not local-ized) and remove a file from #ARGV, there'd still be one file left there to occupy the while in the main. So you still wouldn't get STDIN.)
Another way to see all this: add another print while <>, at the end; that one will wait on STDIN.
This is all described in I/O Operators (perlop), albeit it requires quite close reading.
† On local $GLOBAL_VAR; the value of $GLOBAL_VAR is copied away, and it gets restored as that scope is exited. So local protects the global variable from changes, within its scope.
Related
I'm having some problem with a subroutine that locates certain files and extracts some data out of them.
This subroutine is called inside a foreach loop, but whenever the call is made the loop skips to its next iteration. So I am wondering whether any of the next;'s are somehow escaping from the subroutine to the foreach loop where it is called?
To my knowledge the sub looks solid though so I'm hoping if anyone can see something I'm missing?
sub FindKit{
opendir(DH, "$FindBin::Bin\\data");
my #kitfiles = readdir(DH);
closedir(DH);
my $nametosearch = $_[0];
my $numr = 1;
foreach my $kitfile (#kitfiles)
{
# skip . and .. and Thumbs.db and non-K-files
if($kitfile =~ /^\.$/) {shift #kitfiles; next;}
if($kitfile =~ /^\.\.$/) {shift #kitfiles; next;}
if($kitfile =~ /Thumbs\.db/) {shift #kitfiles; next;}
if($kitfile =~ /^[^K]/) {shift #kitfiles; next;}
# $kitfile is the file used on this iteration of the loop
open (my $fhkits,"<","data\\$kitfile") or die "$!";
while (<$fhkits>) {}
if ($. <= 1) {
print " Empty File!";
next;
}
seek($fhkits,0,0);
while (my $kitrow = <$fhkits>) {
if ($. == 0 && $kitrow =~ /Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/g) {
close $fhkits;
return $1;
}
}
$numr++;
close $fhkits;
}
return 0;
}
To summarize comments, the refactored code:
use File::Glob ':bsd_glob';
sub FindKit {
my $nametosearch = $_[0];
my #kitfiles = glob "$FindBin::Bin/data/K*"; # files that start with K
foreach my $kitfile (#kitfiles)
{
open my $fhkits, '<', $kitfile or die "$!";
my $kitrow_first_line = <$fhkits>; # read first line
return if eof; # next read is end-of-file so it was just header
my ($result) = $kitrow_first_line =~
/Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/;
return $result if $result;
}
return 0;
}
I use core File::Glob and enable :bsd_glob option, which can handle spaces in filenames. I follow the docs note to use "real slash" on Win32 systems.
I check whether there is only a header line using eof.†
I do not see how this can affect the calling code, other than by its return value. Also, I don't see how the posted code can make the caller skip the beat, either. That problem is unlikely to be in this sub.
Please let me know if I missed some point with the above rewrite.
† Previous version used to check whether there is just one (header) line by
1 while <$fhkits>; # check number of lines ...
return if $. == 1; # there was only one line, the header
Also correct but eof is way better
The thing that is almost certainly screwing you here, is that you are shifting the list that you are iterating.
That's bad news, as you're deleting elements ... but in places you aren't necessarily thinking.
For example:
#!/usr/bin/env perl
use strict;
use warnings;
my #list = qw ( one two three );
my $count;
foreach my $value ( #list ) {
print "Iteration ", ++$count," value is $value\n";
if ( $value eq 'two' ) { shift #list; next };
}
print "#list";
How many times do you think that should iterate, and which values should end up in the array?
Because you shift you never process element 'three' and you delete element 'one'. That's almost certainly what's causing you problems.
You also:
open using a relative path, when your opendir used an absolute one.
skip a bunch of files, and then skip anything that doesn't start with K. Why not just search for things that do start with K?
read the file twice, and one is to just check if it's empty. The perl file test -z will do this just fine.
you set $kitrow for each line in the file, but don't really use it for anything other than pattern matching. It'd probably work better using implicit variables.
You only actually do anything on the first line - so you don't ever need to iterate the whole file. ($numr seems to be discarded).
you use a global match, but only use one result. The g flag seems redundant here.
I'd suggest a big rewrite, and do something like this:
#!/usr/bin/env perl
use strict;
use warnings;
use FindBin;
sub FindKit{
my ($nametosearch) = #_;
my $numr = 1;
foreach my $kitfile (glob "$FindBin::Bin\\data\\K*" )
{
if ( -z $kitfile ) {
print "$kitfile is empty\n";
next;
}
# $kitfile is the file used on this iteration of the loop
open (my $fhkits,"<", $kitfile) or die "$!";
<$kitfile> =~ m/Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/
and return $1;
return 0;
}
}
As a big fan of the Path::Tiny module (me have it always installed and using it in every project) my solution would be:
use strict;
use warnings;
use Path::Tiny;
my $found = FindKit('mykit');
print "$found\n";
sub FindKit {
my($nametosearch) = #_;
my $datadir = path($0)->realpath->parent->child('data');
die "$datadir doesn't exists" unless -d $datadir;
for my $file ($datadir->children( qr /^K/ )) {
next if -z $file; #skip empty
my #lines = $file->lines;
return $1 if $lines[0] =~ /Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/;
}
return;
}
Some comments and still opened issues:
Using the Path::Tiny you could always use forward slashes in the path-names, regardless of the OS (UNIX/Windows), e.g. the data/file will work on windows too.
AFAIK the FindBin is considered broken - so the above uses the $0 and realpath ...
what if the Kit is in multiple files? The above always returns on the 1st found one
the my #lines = $file->lines; reads all lines - unnecessary - but on small files doesn't big deal.
the the reality this function returns the arg for the Maakartikel, so probably better name would be find_articel_by_kit or find_articel :)
easy to switch to utf8 - just change the $file->lines to $file->lines_utf8;
I have a very basic perl script which prints the next line in a text file after matching a search pattern.
#ARGV = <dom_boot.txt>;
while ( <> ) {
print scalar <> if /name=sacux445/;
}
Which works, However I would like to capture the output into a file for further use, rather than printing it to STDOUT.
I'm just learning (slowly) so attempted this:
my $fh;
my $dom_bootdev = 'dom_bootdev.txt';
open ($fh, '>', $dom_bootdev) or die "No such file";
#ARGV = <dom_boot.txt>;
while ( <> ) {
print $fh <> if /name=sacux445/;
}
close $fh;
But I get a syntax error.
syntax error at try.plx line 19, near "<>"
I'm struggling to figure this out. I'm guessing it's probably very simple so any help would be appreciated.
Thanks,
Luke.
The Perl parser sometimes has problems with indirect notation. The canonical way to handle it is to wrap the handle into a block:
print {$fh} <> if /name=sacux445/;
Are you sure you want to remove scalar?
Simply fetch the next line within the loop and print it, if the line matches the pattern:
while (<>) {
next unless /name=sacux445/;
my $next = <>;
last unless defined $next;
print $fh $next;
}
Note, you need to check the return value of the diamond operator.
Input
name=sacux445 (1)
aaa
name=sacux445 (2)
bbb
name=sacux445 (3)
Output
aaa
bbb
One should learn to use state machines for parsing data. A state machine allows the input read to be in only one place in the code. Rewriting the code as a state machine:
use strict;
use warnings;
use autodie; # See http://perldoc.perl.org/autodie.html
my $dom_bootdev = 'dom_bootdev.txt';
open ( my $fh, '>', $dom_bootdev ); # autodie handles open errors
use File::Glob qw( :bsd_glob ); # Perl's default glob() does not handle spaces in file names
#ARGV = glob( 'dom_boot.txt' );
my $print_next_line = 0;
while( my $line = <> ){
if( $line =~ /name=sacux445/ ){
$print_next_line = 1;
next;
}
if( $print_next_line ){
print {$fh} $line;
$print_next_line = 0;
next;
}
}
When To Us a State Machine
If the data is context-free, it can be parsed using only regular expressions.
If the data has a tree structure, it can be parsed using a simple state machine.
For more complex structures, a least one state machine with a push-down stack is required. The stack records the previous state so that the machine can return to it when the current state is finished.
The most complex data structure in use is XML. It requires a state machine for its syntax and a second one with a stack for its semantics.
I have already defined a bunch of functions which do a lot of work and have a bunch of print statements. They can be called like so to build an html page.
print_A()
print_B()
print_C()
Now, I want to call these functions and store the contents of these print statements into one main variable. One way is to rewrite these functions so they return a string with their contents (instead of printing)
my $var = "";
$var = $var . store_A();
$var = $var . store_B();
$var = $var . store_C();
But I want to do this without modifying or rewriting the functions. I don't want to redefine these functions with such a minor change (there are hundreds of these functions in the program).
Is there a shorter and faster way to do this in perl?
One way is to use select to redirect STDOUT to a scalar variable:
use warnings;
use strict;
my $out;
open my $fh, '>', \$out;
my $old_stdout = select $fh;
s1();
s2();
select $old_stdout;
close $fh;
print "start\n";
print $out;
print "end\n";
sub s1 {print "s1\n"}
sub s2 {print "s2\n"}
Prints out:
start
s1
s2
end
Depending on just what these functions do, you may be able to run them in a subprocess and capture their output:
my $pid = open(PIPE, "-|");
if (0 == $pid) {
# Child
print_A();
print_B();
print_C();
exit(0);
}
else {
my $var = "";
while(<PIPE>) {
$var .= $_;
}
close(PIPE);
}
You'll have to evaluate whether it's safe to move these function calls into a subprocess. If one of these functions changes the process's global state--for example, if it modifies a global variable--then that change will be confined to the child process, and won't happen in the original script process.
You can also use IO::Scalar to tie a variable to STDOUT.
use IO::Scalar;
my $output_str;
tie *STDOUT, 'IO::Scalar', \$output_str;
print "Hel", "lo, ";
print "world!\n";
untie *STDOUT;
print "output_str is $output_str\n";
# prints
# output_str is Hello, world!
Not effectively different from #toolic's answer though.
This perl script is traversing all directories and sub directories, searching for a file named RUN in it. Then it opens the file and runs the 1st line written in the file. The problem is that I am not able to redirect the output of the system command to a file named error.log and STDERR to another file named test_file.errorlog, but no such file is created.
Note that all variable are declared if not found.
find (\&pickup_run,$path_to_search);
### Subroutine for extracting path of directories with RUN FILE PRESENT
sub pickup_run {
if ($File::Find::name =~/RUN/) {
### If RUN file is present , push it into array named run_file_present
push(#run_file_present,$File::Find::name);
}
}
###### Iterate over the array containing paths to directories containing RUN files one by one
foreach my $var (#run_file_present) {
$var =~ s/\//\\/g;
($path_minus_run=$var) =~ s/RUN\b//;
#print "$path_minus_run\n";
my $test_case_name;
($test_case_name=$path_minus_run) =~ s/expression to be replced//g;
chdir "$path_minus_run";
########While iterating over the paths, open each file
open data, "$var";
#####Run the first two lines containing commands
my #lines = <data>;
my $return_code=system (" $lines[0] >error.log 2>test_file.errorlog");
if($return_code) {
print "$test_case_name \t \t FAIL \n";
}
else {
print "$test_case_name \t \t PASS \n";
}
close (data);
}
The problem is almost certainly that $lines[0] has a newline at the end after being read from the file
But there are several improvements you could make
Always use strict and use warnings at the top of every Perl program, and declare all your variables using my as close as possible to their first point of use
Use the three-parameter form of open and always check whether it succeeded, putting the built-in variable $! into your die string to say why it failed. You can also use autodie to save writing the code for this manually for every open, but it requires Perl v5.10.1 or better
You shouldn't put quotes around scalar variables -- just used them as they are. so chdir $path_minus_run and open data, $var are correct
There is also no need to save all the files to be processed and deal with them later. Within the wanted subroutine, File::Find sets you up with $File::Find::dir set to the directory containing the file, and $_ set to the bare file name without a path. It also does a chdir to the directory for you, so the context is ideal for processing the file
use strict;
use warnings;
use v5.10.1;
use autodie;
use File::Find;
my $path_to_search;
find( \&pickup_run, $path_to_search );
sub pickup_run {
return unless -f and $_ eq 'RUN';
my $cmd = do {
open my $fh, '<', $_;
<$fh>;
};
chomp $cmd;
( my $test_name = $File::Find::dir ) =~ s/expression to be replaced//g;
my $retcode = system( "$cmd >error.log 2>test_file.errorlog" );
printf "%s\t\t%s\n", $test_name, $retcode ? 'FAIL' : 'PASS';
}
I'm trying to read from two files, and generate output in a third. I first wanted to edit the first one on the go but I didn't find a suitable method save for arrays.
My problem is that the third file (output) is empty whenever I uncomment the "_ref_param_handling" function. BUT the following is what puzzles me the most: If I do a UNIX very basic `cat` system call on the output file at then end (see code below), it works just fine. If I open the filehandle just before and close it right after editing, it also works fine (around my print FILEHANDLE LIST).
I undoubtedly am missing something here. Apart from a problem between my keyboard and my chair, what is it? A filehandle conflict? A scope problem?
Every variable is declared and has the value I want it to have.
Edit (not applicable anymore).
Using IO::File on the three files didn't change anything.
Edit 2 : New full subroutine code
My code works (except when my ref already exists, but that's because of the "append" mode i think) but there might be some mistakes and unperlish ways of coding (sorry, Monks). I, however, use Strict and warnings !
sub _ref_edit($) {
my $manda_def = "$dir/manda_def.list";
my $newrefhandle;
my $ref = $_[0];
(my $refout = $ref) =~ s/empty//;
my $refhandle;
my $parname = '';
my $parvalue = '';
my #val;
_printMan;
my $flush = readline STDIN; # Wait for <enter>
# If one or both of the ref. and the default values are missing
if ( !( -e $manda_def && -e $ref ) ) {
die "Cannot find $ref and/or $manda_def";
}
# Open needed files (ref & default)
open( $refhandle, "<", $ref ) or die "Cannot open ref $ref : $!";
open( $newrefhandle, ">>", $refout )
or die "Cannot open new ref $refout : $!";
# Read each line
while ( my $refline = <$refhandle> ) {
# If line read not an editable macro
if ( $refline =~ /^define\({{(.+)}},\s+{{.*__VALUE__.*}}\)/ ){
$parname = $1; # $1 = parameter name captured in regexp
# Prompt user
$parvalue = _ref_param_handling( $parname, $manda_def );
# Substitution in ref
$refline =~ s/__VALUE__/$parvalue/;
# Param not specified and no default value
$parvalue eq '' ? $refline=~s/__COM__/#/ : $refline=~s/__COM__//;
}
print $newrefhandle $refline;
}
close $newrefhandle;
close $refhandle;
return $refout;
} # End ref edit
the _ref_param_handle subroutine still is :
open( $mde, '<', $_[1] )
or die "Cannot open mandatory/default list $_[1] : $!";
# Read default/mandatory file list
while (<$mde>) {
( $name, $manda, $default, $match, $descript ) = split( /\s+/, $_, 5 );
next if ( $name !~ $ref_param ); # If param read differs from parname
(SOME IF/ELSE)
} # End while <MDE>
close $mde;
return $input;
}
Extract from manda_def file :
NAME Mandatory? Default Match Comm.
PORT y NULL ^\d+$ Database port
PROJECT y NULL \w{1,5} Project name
SERVER y NULL \w+ Server name
modemRouting n NULL .+
modlib y bin .+
modules y sms .+
Extract from ref_file :
define({{PORT}}, {{__VALUE__}})dnl
define({{PROJECT}}, {{__VALUE__}})dnl
define({{SERVER}}, {{__VALUE__}})dnl
define({{modemRouting}}, {{__COM__{{$0}} '__VALUE__'}})dnl
define({{modlib}}, {{__COM__{{$0}} '__VALUE__'}})dnl
define({{modules}}, {{__COM__{{$0}} '__VALUE__'}})dnl
Any help appreciated.
It is unclear what is initialising $refhandle, $newrefhandle and $mde. Depending on the values they have will affect the behaviour of open - i.e. whether it will close any filehandles before opening a new one.
I would suggest that you start using the IO::File interface to open/write to files, as this makes the job of filehandle management much easier, and will avoid any inadvertent closes. Something like...
use IO::File;
my $refhandle = IO::File->new("< $ref") or die "open() - $!";
$refhandle->print(...);
As far as editing files in place goes, this is a common pattern I use to achieve this, make sure of the -i behaviour of perl.
sub edit_file
{
my ($filename) = #_;
# you can re-create the one-liner above by localizing #ARGV as the list of
# files the <> will process, and localizing $^I as the name of the backup file.
local (#ARGV) = ($filename);
local($^I) = '.bak';
while (<>)
{
s/original string/new string/g;
}
continue
{
print;
}
}
try opening the second file handle for input outside the loop and pass a reference to the subroutine _ref_param_handle.Use seek function to seek file back to start.If your file is not too large you can also think of storing the content in an array and the accessing it instead of looping over same contents.
EDIT:
Here is a small example to support what I was trying to say above:
#!/usr/bin/perl -w
sub test
{
my $fh_to_read = $_[0] ;
my $fh_to_write = $_[1] ;
while(<$fh_to_read>)
{
print $fh_to_write $_ ;
}
seek($fh_to_read,0,0) ;
}
open(FH1,"<dummy1");
open(FH2,"<dummy2");
open(FH3,">dummy3");
while(<FH2>)
{
print FH3 "$_" ;
test(\*FH1,\*FH3);
}
Info about perl references
From what I gather, your script wants to convert a file in the following form:
define({{VAR1}}, {{__VALUE__}})
define({{VAR2}}, {{__VALUE__}})
define({{VAR3}}, {{__VALUE__}})
define({{VAR4}}, {{__VALUE__}})
to something like this:
define({{VAR1}}, {{}})
define({{VAR2}}, {{VALUE2}})
define({{VAR3}}, {{VALUE3}})
define({{VAR4}}, {{}})
The following works. I don't know what manda_def means, and also I didn't bother to create an actual variable replacement function.
#!/usr/bin/perl
use strict;
use warnings;
sub work {
my ($ref, $newref, $manda_def) = #_;
# Open needed files (ref & default)
open(my $refhandle, '<', $ref) or die "Cannot open ref $ref : $!";
open(my $newrefhandle, '>', $newref) or die "Cannot open new ref $newref: $!";
# Read each line
while (my $refline = <$refhandle>) {
# if line read is not an editable macro
if ($refline =~ /^define\({{(.+)}},\s+{{.*__VALUE__.*}}\)/){
my $parvalue = _ref_param_handling($1, $manda_def); # manda_def?
# Substitution in ref
$refline =~ s/__VALUE__/$parvalue/;
# Param not specified and no default value
$refline =~ s/__COM__/#/ if $parvalue eq '';
}
print $newrefhandle $refline;
}
close $newrefhandle;
close $refhandle;
return $newref;
}
sub _ref_param_handling {
my %parms = (VAR2 => 'VALUE2', VAR3 => 'VALUE3');
return $parms{$_[0]} if exists $parms{$_[0]};
}
work('ref.txt', 'newref.txt', 'manda.txt');
Guys, I seriously consider hanging myself with my wireless mouse.
My script never failed. I just didn't ran it through the end (it's actually a very long parameter list). The printing is just done as soon as the filehandle is closed (or so I guessed)...
/me *cries*
I've spent 24 hours on this...