I have already defined a bunch of functions which do a lot of work and have a bunch of print statements. They can be called like so to build an html page.
print_A()
print_B()
print_C()
Now, I want to call these functions and store the contents of these print statements into one main variable. One way is to rewrite these functions so they return a string with their contents (instead of printing)
my $var = "";
$var = $var . store_A();
$var = $var . store_B();
$var = $var . store_C();
But I want to do this without modifying or rewriting the functions. I don't want to redefine these functions with such a minor change (there are hundreds of these functions in the program).
Is there a shorter and faster way to do this in perl?
One way is to use select to redirect STDOUT to a scalar variable:
use warnings;
use strict;
my $out;
open my $fh, '>', \$out;
my $old_stdout = select $fh;
s1();
s2();
select $old_stdout;
close $fh;
print "start\n";
print $out;
print "end\n";
sub s1 {print "s1\n"}
sub s2 {print "s2\n"}
Prints out:
start
s1
s2
end
Depending on just what these functions do, you may be able to run them in a subprocess and capture their output:
my $pid = open(PIPE, "-|");
if (0 == $pid) {
# Child
print_A();
print_B();
print_C();
exit(0);
}
else {
my $var = "";
while(<PIPE>) {
$var .= $_;
}
close(PIPE);
}
You'll have to evaluate whether it's safe to move these function calls into a subprocess. If one of these functions changes the process's global state--for example, if it modifies a global variable--then that change will be confined to the child process, and won't happen in the original script process.
You can also use IO::Scalar to tie a variable to STDOUT.
use IO::Scalar;
my $output_str;
tie *STDOUT, 'IO::Scalar', \$output_str;
print "Hel", "lo, ";
print "world!\n";
untie *STDOUT;
print "output_str is $output_str\n";
# prints
# output_str is Hello, world!
Not effectively different from #toolic's answer though.
Related
IMPORTANT: The motivation for this question is not to solve a problem, but to understand Perl's behavior.
Consider the following toy script:
#!/usr/bin/env perl
use strict;
sub main {
#ARGV >= 2 or die "$0: not enough arguments\n";
my $arg_a = shift #ARGV;
my $arg_b = shift #ARGV;
while ( <> ) {
print "+++ $_";
}
}
main();
__END__
This script takes 2 or more arguments (which it doesn't use). All it does is to echo (with a +++ prefix) either its standard input or the contents of however many files are specified as its third, fourth, etc., arguments.
So far the code behaves as I expect it to.
Now consider this slightly modified version:
#!/usr/bin/env perl
use strict;
sub slurp {
local $/ = undef;
local #ARGV = #_;
return <>;
}
sub main {
#ARGV >= 2 or die "$0: not enough arguments\n";
my $arg_a = shift #ARGV;
my $arg_b = shift #ARGV;
my $content = slurp( $arg_a );
while ( <> ) {
print "+++ $_";
}
}
main();
__END__
This version of the script does not ignore its first argument; rather, it interprets as the path to a file, and reads its contents into the variable $content (which it subsequently ignores). Other than this, the script should behave exactly as before.
Unfortunately, this version of the script no longer echoes its stdin (though it still does echo the contents of its 3rd, 4th, etc. arguments).
I know that the problem has to do with the way the slurp function is implemented, because if I change this implementation to
sub slurp {
local $/ = undef;
open my $input, $_[ 0 ] or die "$!";
return <$input>;
}
...then the script once more echoes its stdin, when available.
I would like to understand why the first version of slurp causes the script to stop working as expected.
You need to exhaust the iterator (by calling it until it returns undef) before it considers using STDIN again.
sub slurp {
local $/ = undef;
local #ARGV = #_;
my $rv = <>; # Read file specified by $_[0].
1 while <>; # Exhaust the iterator.
return $rv;
}
or
sub slurp {
local $/ = undef;
local #ARGV = #_;
my $rv = "";
while (my $file = <>) {
$rv .= $file;
}
return $rv; # Concatenation of all files specified by #_.
}
For <> to work with STDIN it has to be called when #ARGV is empty. If there are filenames in #ARGV when <> is run they are removed from there as files are read, and then you'd need to call <> again in order to wait for the STDIN.
perl -wE'if (#ARGV) { print while <> }; print while <>' file
It is the second print while <> that waits on STDIN (without it file is printed and program exits).
That could in principle happen with your sub, if it were to read all files from #ARGV and once the control is back at the <> invocation in the main that one would then wait for STDIN.
However, your sub localizes the #ARGV (good practice!), so once it exits the global #ARGV still has what it did in the beginning.† Then while in the main reads those files (again), gets that one undef that it is due at the end of the last file, and exits.
One way to see this: remove all from #ARGV after the sub that reads input is called and before the while in main,. Then that while will wait for STDIN again, regardless of the sub. Like
perl -wE'
sub ri { local #ARGV = #_; return <> };
print for ri(#ARGV);
say"argv: #ARGV";
#ARGV=();
print while <>
' file
(A detail to note is that your example seems to take two files while the sub deals with one, so even if the sub were to use the global #ARGV (not local-ized) and remove a file from #ARGV, there'd still be one file left there to occupy the while in the main. So you still wouldn't get STDIN.)
Another way to see all this: add another print while <>, at the end; that one will wait on STDIN.
This is all described in I/O Operators (perlop), albeit it requires quite close reading.
† On local $GLOBAL_VAR; the value of $GLOBAL_VAR is copied away, and it gets restored as that scope is exited. So local protects the global variable from changes, within its scope.
I'm having some problem with a subroutine that locates certain files and extracts some data out of them.
This subroutine is called inside a foreach loop, but whenever the call is made the loop skips to its next iteration. So I am wondering whether any of the next;'s are somehow escaping from the subroutine to the foreach loop where it is called?
To my knowledge the sub looks solid though so I'm hoping if anyone can see something I'm missing?
sub FindKit{
opendir(DH, "$FindBin::Bin\\data");
my #kitfiles = readdir(DH);
closedir(DH);
my $nametosearch = $_[0];
my $numr = 1;
foreach my $kitfile (#kitfiles)
{
# skip . and .. and Thumbs.db and non-K-files
if($kitfile =~ /^\.$/) {shift #kitfiles; next;}
if($kitfile =~ /^\.\.$/) {shift #kitfiles; next;}
if($kitfile =~ /Thumbs\.db/) {shift #kitfiles; next;}
if($kitfile =~ /^[^K]/) {shift #kitfiles; next;}
# $kitfile is the file used on this iteration of the loop
open (my $fhkits,"<","data\\$kitfile") or die "$!";
while (<$fhkits>) {}
if ($. <= 1) {
print " Empty File!";
next;
}
seek($fhkits,0,0);
while (my $kitrow = <$fhkits>) {
if ($. == 0 && $kitrow =~ /Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/g) {
close $fhkits;
return $1;
}
}
$numr++;
close $fhkits;
}
return 0;
}
To summarize comments, the refactored code:
use File::Glob ':bsd_glob';
sub FindKit {
my $nametosearch = $_[0];
my #kitfiles = glob "$FindBin::Bin/data/K*"; # files that start with K
foreach my $kitfile (#kitfiles)
{
open my $fhkits, '<', $kitfile or die "$!";
my $kitrow_first_line = <$fhkits>; # read first line
return if eof; # next read is end-of-file so it was just header
my ($result) = $kitrow_first_line =~
/Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/;
return $result if $result;
}
return 0;
}
I use core File::Glob and enable :bsd_glob option, which can handle spaces in filenames. I follow the docs note to use "real slash" on Win32 systems.
I check whether there is only a header line using eof.†
I do not see how this can affect the calling code, other than by its return value. Also, I don't see how the posted code can make the caller skip the beat, either. That problem is unlikely to be in this sub.
Please let me know if I missed some point with the above rewrite.
† Previous version used to check whether there is just one (header) line by
1 while <$fhkits>; # check number of lines ...
return if $. == 1; # there was only one line, the header
Also correct but eof is way better
The thing that is almost certainly screwing you here, is that you are shifting the list that you are iterating.
That's bad news, as you're deleting elements ... but in places you aren't necessarily thinking.
For example:
#!/usr/bin/env perl
use strict;
use warnings;
my #list = qw ( one two three );
my $count;
foreach my $value ( #list ) {
print "Iteration ", ++$count," value is $value\n";
if ( $value eq 'two' ) { shift #list; next };
}
print "#list";
How many times do you think that should iterate, and which values should end up in the array?
Because you shift you never process element 'three' and you delete element 'one'. That's almost certainly what's causing you problems.
You also:
open using a relative path, when your opendir used an absolute one.
skip a bunch of files, and then skip anything that doesn't start with K. Why not just search for things that do start with K?
read the file twice, and one is to just check if it's empty. The perl file test -z will do this just fine.
you set $kitrow for each line in the file, but don't really use it for anything other than pattern matching. It'd probably work better using implicit variables.
You only actually do anything on the first line - so you don't ever need to iterate the whole file. ($numr seems to be discarded).
you use a global match, but only use one result. The g flag seems redundant here.
I'd suggest a big rewrite, and do something like this:
#!/usr/bin/env perl
use strict;
use warnings;
use FindBin;
sub FindKit{
my ($nametosearch) = #_;
my $numr = 1;
foreach my $kitfile (glob "$FindBin::Bin\\data\\K*" )
{
if ( -z $kitfile ) {
print "$kitfile is empty\n";
next;
}
# $kitfile is the file used on this iteration of the loop
open (my $fhkits,"<", $kitfile) or die "$!";
<$kitfile> =~ m/Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/
and return $1;
return 0;
}
}
As a big fan of the Path::Tiny module (me have it always installed and using it in every project) my solution would be:
use strict;
use warnings;
use Path::Tiny;
my $found = FindKit('mykit');
print "$found\n";
sub FindKit {
my($nametosearch) = #_;
my $datadir = path($0)->realpath->parent->child('data');
die "$datadir doesn't exists" unless -d $datadir;
for my $file ($datadir->children( qr /^K/ )) {
next if -z $file; #skip empty
my #lines = $file->lines;
return $1 if $lines[0] =~ /Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/;
}
return;
}
Some comments and still opened issues:
Using the Path::Tiny you could always use forward slashes in the path-names, regardless of the OS (UNIX/Windows), e.g. the data/file will work on windows too.
AFAIK the FindBin is considered broken - so the above uses the $0 and realpath ...
what if the Kit is in multiple files? The above always returns on the 1st found one
the my #lines = $file->lines; reads all lines - unnecessary - but on small files doesn't big deal.
the the reality this function returns the arg for the Maakartikel, so probably better name would be find_articel_by_kit or find_articel :)
easy to switch to utf8 - just change the $file->lines to $file->lines_utf8;
I have a very basic perl script which prints the next line in a text file after matching a search pattern.
#ARGV = <dom_boot.txt>;
while ( <> ) {
print scalar <> if /name=sacux445/;
}
Which works, However I would like to capture the output into a file for further use, rather than printing it to STDOUT.
I'm just learning (slowly) so attempted this:
my $fh;
my $dom_bootdev = 'dom_bootdev.txt';
open ($fh, '>', $dom_bootdev) or die "No such file";
#ARGV = <dom_boot.txt>;
while ( <> ) {
print $fh <> if /name=sacux445/;
}
close $fh;
But I get a syntax error.
syntax error at try.plx line 19, near "<>"
I'm struggling to figure this out. I'm guessing it's probably very simple so any help would be appreciated.
Thanks,
Luke.
The Perl parser sometimes has problems with indirect notation. The canonical way to handle it is to wrap the handle into a block:
print {$fh} <> if /name=sacux445/;
Are you sure you want to remove scalar?
Simply fetch the next line within the loop and print it, if the line matches the pattern:
while (<>) {
next unless /name=sacux445/;
my $next = <>;
last unless defined $next;
print $fh $next;
}
Note, you need to check the return value of the diamond operator.
Input
name=sacux445 (1)
aaa
name=sacux445 (2)
bbb
name=sacux445 (3)
Output
aaa
bbb
One should learn to use state machines for parsing data. A state machine allows the input read to be in only one place in the code. Rewriting the code as a state machine:
use strict;
use warnings;
use autodie; # See http://perldoc.perl.org/autodie.html
my $dom_bootdev = 'dom_bootdev.txt';
open ( my $fh, '>', $dom_bootdev ); # autodie handles open errors
use File::Glob qw( :bsd_glob ); # Perl's default glob() does not handle spaces in file names
#ARGV = glob( 'dom_boot.txt' );
my $print_next_line = 0;
while( my $line = <> ){
if( $line =~ /name=sacux445/ ){
$print_next_line = 1;
next;
}
if( $print_next_line ){
print {$fh} $line;
$print_next_line = 0;
next;
}
}
When To Us a State Machine
If the data is context-free, it can be parsed using only regular expressions.
If the data has a tree structure, it can be parsed using a simple state machine.
For more complex structures, a least one state machine with a push-down stack is required. The stack records the previous state so that the machine can return to it when the current state is finished.
The most complex data structure in use is XML. It requires a state machine for its syntax and a second one with a stack for its semantics.
I am reviewing for a test and I can't seem to get this example to code out right.
Problem: Write a perl script, called ileaf, which will linterleave the lines of a file with those of another file writing the result to a third file. If the files are a different length then the excess lines are written at the end.
A sample invocation:
ileaf file1 file2 outfile
This is what I have:
#!/usr/bin/perl -w
open(file1, "$ARGV[0]");
open(file2, "$ARGV[1]");
open(file3, ">$ARGV[2]");
while(($line1 = <file1>)||($line2 = <file2>)){
if($line1){
print $line1;
}
if($line2){
print $line2;
}
}
This sends the information to screen so I can immediately see the result. The final verson should "print file3 $line1;" I am getting all of file1 then all of file2 w/out and interleaving of the lines.
If I understand correctly, this is a function of the use of the "||" in my while loop. The while checks the first comparison and if it's true drops into the loop. Which will only check file1. Once file1 is false then the while checks file2 and again drops into the loop.
What can I do to interleave the lines?
You're not getting what you want from while(($line1 = <file1>)||($line2 = <file2>)){ because as long as ($line1 = <file1>) is true, ($line2 = <file2>) never happens.
Try something like this instead:
open my $file1, "<", $ARGV[0] or die;
open my $file2, "<", $ARGV[1] or die;
open my $file3, ">", $ARGV[2] or die;
while (my $f1 = readline ($file1)) {
print $file3 $f1; #line from file1
if (my $f2 = readline ($file2)) { #if there are any lines left in file2
print $file3 $f2;
}
}
while (my $f2 = readline ($file2)) { #if there are any lines left in file2
print $file3 $f2;
}
close $file1;
close $file2;
close $file3;
You'd think if they're teaching you Perl, they'd use the modern Perl syntax. Please don't take this personally. After all, this is how you were taught. However, you should know the new Perl programming style because it helps eliminates all sorts of programming mistakes, and makes your code easier to understand.
Use the pragmas use strict; and use warnings;. The warnings pragma replaces the need for the -w flag on the command line. It's actually more flexible and better. For example, I can turn off particular warnings when I know they'll be an issue. The use strict; pragma requires me to declare my variables with either a my or our. (NOTE: Don't declare Perl built in variables). 99% of the time, you'll use my. These variables are called lexically scoped, but you can think of them as true local variables. Lexically scoped variables don't have any value outside of their scope. For example, if you declare a variable using my inside a while loop, that variable will disappear once the loop exits.
Use the three parameter syntax for the open statement: In the example below, I use the three parameter syntax. This way, if a file is called >myfile, I'll be able to read from it.
**Use locally defined file handles. Note that I use my $file_1_fh instead of simply FILE_1_HANDLE. The old way, FILE_1_HANDLE is globally scoped, plus it's very difficult to pass the file handle to a function. Using lexically scoped file handles just works better.
Use or and and instead of || and &&: They're easier to understand, and their operator precedence is better. They're more likely not to cause problems.
Always check whether your open statement worked: You need to make sure your open statement actually opened a file. Or use the use autodie; pragma which will kill your program if the open statements fail (which is probably what you want to do anyway.
And, here's your program:
#! /usr/bin/env perl
#
use strict;
use warnings;
use autodie;
open my $file_1, "<", shift;
open my $file_2, "<", shift;
open my $output_fh, ">", shift;
for (;;) {
my $line_1 = <$file_1>;
my $line_2 = <$file_2>;
last if not defined $line_1 and not defined $line_2;
no warnings qw(uninitialized);
print {$output_fh} $line_1 . $line_2;
use warnings;
}
In the above example, I read from both files even if they're empty. If there's nothing to read, then $line_1 or $line_2 is simply undefined. After I do my read, I check whether both $line_1 and $line_2 are undefined. If so, I use last to end my loop.
Because my file handle is a scalar variable, I like putting it in curly braces, so people know it's a file handle and not a variable I want to print out. I don't need it, but it improves clarity.
Notice the no warnings qw(uninitialized);. This turns off the uninitialized warning I'll get. I know that either $line_1 or $line_3 might be uninitialized, so I don't want the warning. I turn it back on right below my print statement because it is a valuable warning.
Here's another way to do that for loop:
while ( 1 ) {
my $line_1 = <$file_1>;
my $line_2 = <$file_2>;
last if not defined $line_1 and not defined $line_2;
print {$output_fh} $line_1 if defined $line_1;
print {$output_fh} $line_2 if defined $line_2;
}
The infinite loop is a while loop instead of a for loop. Some people don't like the C style of for loop and have banned it from their coding practices. Thus, if you have an infinite loop, you use while ( 1 ) {. To me, maybe because I came from a C background, for (;;) { means infinite loop, and while ( 1 ) { takes a few extra milliseconds to digest.
Also, I check whether $line_1 or $line_2 is defined before I print them out. I guess it's better than using no warning and warning, but I need two separate print statements instead of combining them into one.
Here's another option that uses List::MoreUtils's zip to interleave arrays and File::Slurp to read and write files:
use strict;
use warnings;
use List::MoreUtils qw/zip/;
use File::Slurp qw/read_file write_file/;
chomp( my #file1 = read_file shift );
chomp( my #file2 = read_file shift );
write_file shift, join "\n", grep defined $_, zip #file1, #file2;
Just noticed Tim A has a nice solution already posted. This solution is a bit wordier, but might illustrate exactly what is going on a bit more.
The method I went with reads all of the lines from both files into two arrays, then loops through them using a counter.
#!/usr/bin/perl -w
use strict;
open(IN1, "<", $ARGV[0]);
open(IN2, "<", $ARGV[1]);
my #file1_lines;
my #file2_lines;
while (<IN1>) {
push (#file1_lines, $_);
}
close IN1;
while (<IN2>) {
push (#file2_lines, $_);
}
close IN2;
my $file1_items = #file1_lines;
my $file2_items = #file2_lines;
open(OUT, ">", $ARGV[2]);
my $i = 0;
while (($i < $file1_items) || ($i < $file2_items)) {
if (defined($file1_lines[$i])) {
print OUT $file1_lines[$i];
}
if (defined($file2_lines[$i])) {
print OUT $file2_lines[$i];
}
$i++
}
close OUT;
I have some code that looks like
my ($ids,$nIds);
while (<myFile>){
chomp;
$ids.= $_ . " ";
$nIds++;
}
This should concatenate every line in my myFile, and nIds should be my number of lines. How do I print out my $ids and $nIds?
I tried simply print $ids, but Perl complains.
my ($ids, $nIds)
is a list, right? With two elements?
print "Number of lines: $nids\n";
print "Content: $ids\n";
How did Perl complain? print $ids should work, though you probably want a newline at the end, either explicitly with print as above or implicitly by using say or -l/$\.
If you want to interpolate a variable in a string and have something immediately after it that would looks like part of the variable but isn't, enclose the variable name in {}:
print "foo${ids}bar";
You should always include all relevant code when asking a question. In this case, the print statement that is the center of your question. The print statement is probably the most crucial piece of information. The second most crucial piece of information is the error, which you also did not include. Next time, include both of those.
print $ids should be a fairly hard statement to mess up, but it is possible. Possible reasons:
$ids is undefined. Gives the warning undefined value in print
$ids is out of scope. With use
strict, gives fatal warning Global
variable $ids needs explicit package
name, and otherwise the undefined
warning from above.
You forgot a semi-colon at the end of
the line.
You tried to do print $ids $nIds,
in which case perl thinks that $ids
is supposed to be a filehandle, and
you get an error such as print to
unopened filehandle.
Explanations
1: Should not happen. It might happen if you do something like this (assuming you are not using strict):
my $var;
while (<>) {
$Var .= $_;
}
print $var;
Gives the warning for undefined value, because $Var and $var are two different variables.
2: Might happen, if you do something like this:
if ($something) {
my $var = "something happened!";
}
print $var;
my declares the variable inside the current block. Outside the block, it is out of scope.
3: Simple enough, common mistake, easily fixed. Easier to spot with use warnings.
4: Also a common mistake. There are a number of ways to correctly print two variables in the same print statement:
print "$var1 $var2"; # concatenation inside a double quoted string
print $var1 . $var2; # concatenation
print $var1, $var2; # supplying print with a list of args
Lastly, some perl magic tips for you:
use strict;
use warnings;
# open with explicit direction '<', check the return value
# to make sure open succeeded. Using a lexical filehandle.
open my $fh, '<', 'file.txt' or die $!;
# read the whole file into an array and
# chomp all the lines at once
chomp(my #file = <$fh>);
close $fh;
my $ids = join(' ', #file);
my $nIds = scalar #file;
print "Number of lines: $nIds\n";
print "Text:\n$ids\n";
Reading the whole file into an array is suitable for small files only, otherwise it uses a lot of memory. Usually, line-by-line is preferred.
Variations:
print "#file" is equivalent to
$ids = join(' ',#file); print $ids;
$#file will return the last index
in #file. Since arrays usually start at 0,
$#file + 1 is equivalent to scalar #file.
You can also do:
my $ids;
do {
local $/;
$ids = <$fh>;
}
By temporarily "turning off" $/, the input record separator, i.e. newline, you will make <$fh> return the entire file. What <$fh> really does is read until it finds $/, then return that string. Note that this will preserve the newlines in $ids.
Line-by-line solution:
open my $fh, '<', 'file.txt' or die $!; # btw, $! contains the most recent error
my $ids;
while (<$fh>) {
chomp;
$ids .= "$_ "; # concatenate with string
}
my $nIds = $.; # $. is Current line number for the last filehandle accessed.
How do I print out my $ids and $nIds?
print "$ids\n";
print "$nIds\n";
I tried simply print $ids, but Perl complains.
Complains about what? Uninitialised value? Perhaps your loop was never entered due to an error opening the file. Be sure to check if open returned an error, and make sure you are using use strict; use warnings;.
my ($ids, $nIds) is a list, right? With two elements?
It's a (very special) function call. $ids,$nIds is a list with two elements.