In a single Perl script, can I close and re-open STDIN? - perl

I want to write perl scripts that can read the STDIN that is given at invocation of the script, finish reading it, and then interactively prompt the user for a one-line STDIN. This one-line STDIN will tell the script how to proceed.
In a practical application, I would like the script to create a temporary file, report on the size of temporary file, and then ask the user if they really want to print the entire temporary file to STDOUT, or do they want to give a filename that will be clobbered by the temporary file's contents.
The following script behaves as desired if I give STDIN as a filename but does not work if I pipe STDIN to the script.
#! /usr/bin/perl
use strict; use warnings;
my $count = 0;
while(<>)
{
$count++;
}
print "you counted $count lines. Now do you want to proceed?";
my $answer = <STDIN>;
chomp $answer;
print STDERR "answer=$answer\n";
if ( $answer eq "yes" )
{
print STDERR "you said $answer so we do something affirmative\n";
}
else
{
print STDERR "you said $answer which is not \"yes\" so we do NOT proceed\n";
}
for instance
> wc junk
193 1042 11312 junk
> junk.pl junk
you counted 193 lines. Now do you want to proceed?yes
answer=yes
you said yes so we do something affirmative
> junk.pl junk
you counted 193 lines. Now do you want to proceed?no
answer=no
you said no which is not "yes" so we do NOT proceed
> cat junk | junk.pl
Use of uninitialized value $answer in scalar chomp at /Users/BNW/u/kh/bin/junk.pl line 10.
Use of uninitialized value $answer in concatenation (.) or string at /Users/BNW/u/kh/bin/junk.pl line 11.
answer=
Use of uninitialized value $answer in string eq at /Users/BNW/u/kh/bin/junk.pl line 12.
Use of uninitialized value $answer in concatenation (.) or string at /Users/BNW/u/kh/bin/junk.pl line 18.
you said which is not "yes" so we do NOT proceed
you counted 193 lines. Now do you want to proceed?>

Sort of. Maybe.
First off, in your first example, it's not true that you "gave STDIN as a filename". STDIN is the terminal throughout. <> is reading from the ARGV handle, not STDIN, so STDIN is available later when you need it.
The problem with the second example example is that the pipe from cat is STDIN. Closing it and reopening it to what it was initially doesn't do anything for you, because it will still be an exhausted pipe.
Many systems, though, have a special device /dev/tty which points to the controlling terminal of whichever process asks for it. On such a system, you could reopen STDIN from /dev/tty after it gives EOF, and you would get the console that the user invoked your program from, instead of whatever file or pipe they initially gave you as STDIN.

Thanks to #hobbs. Note that this works either way: piping the file junk into the script or passing junk as ARGV.
> printf "line 1 \nline 222 \n" > junk
> perl -e 'use strict; use warnings; while(<>) { print; } my $stuff = "/dev/tty"; my $h; open $h, "<", $stuff or die "waah $stuff"; print "give answer:"; my $answer=<$h>; print "answer=$answer\n";' junk
line 1
line 222
give answer:This is an answer!
answer=This is an answer!
> cat junk | perl -e 'use strict; use warnings; while(<>) { print; } my $stuff = "/dev/tty"; my $h; open $h, "<", $stuff or die "waah $stuff"; print "give answer:"; my $answer=<$h>; print "answer=$answer\n";'
line 1
line 222
give answer:So, what was it she was saying?? ??
answer=So, what was it she was saying?? ??
>

Related

Passing arguments containing spaces from one script to another in Perl

I am trying to pass arguments from one Perl script to another. Some of the arguments contain spaces.
I am reading in a comma-delimited text file and splitting each line on the comma.
my ($jockey, $racecourse, $racenum, $hnamenum, $trainer, $TDRating, $PRO) = split(/,/, $line);
The data in the comma-delimited text file look as follows:
AARON LYNCH,WARRNAMBOOL,RACE 1,DAREBIN (8),ERIC MUSGROVE,B,1
When I print out each variable, from the parent script, they look fine (as above).
print "$jockey\n";
print "$racecourse\n";
print "$racenum\n";
print "$hnamenum\n";
print "$trainer\n";
print "$TDRating\n";
print "$PRO\n";
AARON LYNCH
WARRNAMBOOL
RACE 1
DAREBIN (8)
ERIC MUSGROVE
B
1
When I pass the arguments to the child script (as follows), the arguments are passed incorrectly.
system("perl \"$bindir\\narrative4.pl\" $jockey $racecourse $racenum $hnamenum $trainer $TDRating $PRO");
AARON
LYNCH
WARRNAMBOOL
RACE
1
DAREBIN
(8)
As you can see, $ARGV[0] becomes AARON, $ARGV[1] becomes LYNCH, $ARGV[2] becomes WARRNAMBOOL, and so on.
I have investigated adding quotes to the arguments using qq, quotemeta and Win32::ShellQuote, unfortunately, even if I pass qq{"$jockey"}, the quotes are still stripped before they reach the child script, so they must be protected in some way.
I not sure if either of the aforementioned solutions is the correct but I'm happy to be corrected.
I'd appreciate any suggestions. Thanks in advance.
Note: I am running this using Strawberry Perl on a Windows 10 PC.
Note2: I purposely left out use strict; & use warnings; in these examples.
Parent Script
use Cwd;
$dir = getcwd;
$bin = "bin"; $bindir = "$dir/$bin";
$infile = "FINAL-SORTED-JOCKEY-RIDES-FILE.list";
open (INFILE, "<$infile") or die "Could not open $infile $!\n";
while (<INFILE>)
{
$line = $_;
chomp($line);
my ($jockey, $racecourse, $racenum, $hnamenum, $trainer, $TDRating, $PRO) = split(/,/, $line);
print "$jockey\n";
print "$racecourse\n";
print "$racenum\n";
print "$hnamenum\n";
print "$trainer\n";
print "$TDRating\n";
print "$PRO\n";
system("perl \"$bindir\\narrative4.pl\" $jockey $racecourse $racenum $hnamenum $trainer $TDRating $PRO");
sleep (1);
}
close INFILE;
exit;
Child Script
$passedjockey = $ARGV[0];
$passedracecourse = $ARGV[1];
$passedracenum = $ARGV[2];
$passedhnamenum = $ARGV[3];
$passedtrainer = $ARGV[4];
$passedTDRating = $ARGV[5];
$passedPRO = $ARGV[6];
print "$passedjockey\n";
print "$passedracecourse\n";
print "$passedracenum\n";
print "$passedhnamenum\n";
print "$passedtrainer\n";
print "$passedTDRating\n";
print "$passedPRO\n\n";
That whole double-quoted string that is passed to system is first evaluated and thus all variables are interpolated -- so the intended multi-word arguments become merely words in a list. So in the end the string has a command to run with individual words as arguments.
Then, even if you figure out how to stick which quotes in there just right, so to keep those multi-word arguments "together," there's still a chance of a shell being invoked, in which case those arguments again get broken up into words before being passed to the program.
Instead of all this use the LIST form of system. The first argument is then the name of the program that will be directly executed without a shell (see docs for some details on that), and the remaining arguments are passed as they are to that program.
parent
use warnings;
use strict;
use feature 'say';
my #args = ('first words', 'another', 'two more', 'final');
my $prog = 'print_args.pl';
system($prog, #args) == 0
or die "Error w/ system($prog, #args): $!";
and the invoked print_args.pl
use warnings;
use strict;
use feature 'say';
say for #ARGV;
The #ARGV contains arguments passed to the program at invocation. There's more that can be done to inspect the error, see docs and links in them.†
By what you show you indeed don't need a shell and the LIST form is generally easy to recommend as a basic way to use system, when the shell isn't needed. If you were to need shell's capabilities for something in that command then you'd have to figure out how to protect those spaces.
† And then there are modules for running external programs that are far better than system & Co. From ease-of-use to features and power:
IPC::System::Simple, Capture::Tiny, IPC::Run3, IPC::Run.

Can one concatenate two Perl scripts which use different input record separators?

Two Perl scripts, using different input record separators, work together to convert a LaTeX file into something easily searched for human-readable phrases and sentences. Of course, they could be wrapped together by a single shell script. But I am curious whether they can be incorporated into a single Perl script.
The reason for these scripts: It would be a hassle to find "two three" inside short.tex, for instance. But after conversion, grep 'two three' will return the first paragraph.
For any LaTeX file (here, short.tex), the scripts are invoked as follows.
cat short.tex | try1.pl | try2.pl
try1.pl works on paragraphs. It gets rid of LaTeX comments. It makes sure that each word is separated from its neighbors by a single space, so that no sneaky tabs, form feeds, etc., lurk between words. The resulting paragraph occupies a single line, consisting of visible characters separated by single spaces --- and at the end, a sequence of at least two newlines.
try2.pl slurps the entire file. It makes sure that paragraphs are separated from each other by exactly two newlines. And it ensures that the last line of the file is non-trivial, containing visible character(s).
Can one elegantly concatenate two operations such as these, which depend on different input record separators, into a single Perl script, say big.pl? For instance, could the work of try1.pl and try2.pl be accomplished by two functions or bracketed segments inside the larger script?
Incidentally, is there a Stack Overflow keyword for "input record separator"?
###File try1.pl:
#!/usr/bin/perl
use strict;
use warnings;
use 5.18.2;
local $/ = ""; # input record separator: loop through one paragraph at a time. position marker $ comes only at end of paragraph.
while (<>) {
s/[\x25].*\n/ /g; # remove all LaTeX comments. They start with %
s/[\t\f\r ]+/ /g; # collapse each "run" of whitespace to one single space
s/^\s*\n/\n/g; # any line that looks blank is converted to a pure newline;
s/(.)\n/$1/g; # Any line that does not look blank is joined to the subsequent line
print;
print "\n\n"; # make sure each paragraph is separated from its fellows by newlines
}
###File try2.pl:
#!/usr/bin/perl
use strict;
use warnings;
use 5.18.2;
local $/ = undef; # input record separator: entire text or file is a single record.
while (<>) {
s/[\n][\n]+/\n\n/g; # exactly 2 blank lines separate paragraphs. Like cat -s
s/[\n]+$/\n/; # last line is nontrivial; no blank line at the end
print;
}
###File short.tex:
\paragraph{One}
% comment
two % also 2
three % or 3
% comment
% comment
% comment
% comment
% comment
% comment
So they said%
that they had done it.
% comment
% comment
% comment
Fleas.
% comment
% comment
After conversion:
\paragraph{One} two three
So they said that they had done it.
Fleas.
To combine try1.pl and try2.pl into a single script you could try:
local $/ = "";
my #lines;
while (<>) {
[...] # Same code as in try1.pl except print statements
push #lines, $_;
}
$lines[-1] =~ s/\n+$/\n/;
print for #lines;
A pipe connects the output of one process to the input of another process. Neither one knows about the other nor cares how it operates.
But, putting things together like this breaks the Unix pipeline philosophy of small tools that each excel at a very narrow job. Should you link these two things, you'll always have to do both tasks even if you want one (although you could get into configuration to turn off one, but that's a lot of work).
I process a lot of LaTeX, and I control everything through a Makefile. I don't really care about what the commands look like and I don't even have to remember what they are:
short-clean.tex: short.tex
cat short.tex | try1.pl | try2.pl > $#
Let's do it anyways
I'll limit myself to the constraint of basic concatenation instead of complete rewriting or rearranging, most because there are some interesting things to show.
Consider what happens should you concatenate those two programs by simply adding the text of the second program at the end of the text of the first program.
The output from the original first program still goes to standard output and the second program now doesn't get that output as input.
The input to the program is likely exhausted by the original first program and the second program now has nothing to read. That's fine because it would have read the unprocessed input to the first program.
There are various ways to fix this, but none of them make much sense when you already have two working program that do their job. I'd shove that in the Makefile and forget about it.
But, suppose you do want it all in one file.
Rewrite the first section to send its output to a filehandle connected to a string. It's output is now in the programs memory. This basically uses the same interface, and you can even use select to make that the default filehandle.
Rewrite the second section to read from a filehandle connected to that string.
Alternately, you can do the same thing by writing to a temporary file in the first part, then reading that temporary file in the second part.
A much more sophisticated program would the first program write to a pipe (inside the program) that the second program is simultaneously reading. However, you have to pretty much rewrite everything so the two programs are happening simultaneously.
Here's Program 1, which uppercases most of the letters:
#!/usr/bin/perl
use v5.26;
$|++;
while( <<>> ) { # safer line input operator
print tr/a-z/A-Z/r;
}
and here's Program 2, which collapses whitespace:
#!/usr/bin/perl
use v5.26;
$|++;
while( <<>> ) { # safer line input operator
print s/\s+/ /gr;
}
They work serially to get the job done:
$ perl program1.pl
The quick brown dog jumped over the lazy fox.
THE QUICK BROWN DOG JUMPED OVER THE LAZY FOX.
^D
$ perl program2.pl
The quick brown dog jumped over the lazy fox.
The quick brown dog jumped over the lazy fox.
^D
$ perl program1.pl | perl program2.pl
The quick brown dog jumped over the lazy fox.
THE QUICK BROWN DOG JUMPED OVER THE LAZY FOX.
^D
Now I want to combine those. First, I'll make some changes that don't affect the operation but will make it easier for me later. Instead of using implicit filehandles, I'll make those explicit and one level removed from the actual filehandles:
Program 1:
#!/usr/bin/perl
use v5.26;
$|++;
my $output_fh = \*STDOUT;
while( <<>> ) { # safer line input operator
print { $output_fh } tr/a-z/A-Z/r;
}
Program 2:
#!/usr/bin/perl
$|++;
my $input_fh = \*STDIN;
while( <$input_fh> ) { # safer line input operator
print s/\s+/ /gr;
}
Now I have the chance to change what those filehandles are without disturbing the meat of the program. The while doesn't know or care what that filehandle is, so let's start by writing to a file in Program 1 and reading from that same file in Program 2:
Program 1:
#!/usr/bin/perl
use v5.26;
open my $output_fh, '>', 'program1.out' or die "$!";
while( <<>> ) { # safer line input operator
print { $output_fh } tr/a-z/A-Z/r;
}
close $output_fh;
Program 2:
#!/usr/bin/perl
$|++;
open my $input_fh, '<', 'program1.out' or die "$!";
while( <$input_fh> ) { # safer line input operator
print s/\h+/ /gr;
}
However, you can no longer run these in a pipeline because Program 1 doesn't use standard output and Program 2 doesn't read standard input:
% perl program1.pl
% perl program2.pl
You can, however, now join the programs, shebang and all:
#!/usr/bin/perl
use v5.26;
open my $output_fh, '>', 'program1.out' or die "$!";
while( <<>> ) { # safer line input operator
print { $output_fh } tr/a-z/A-Z/r;
}
close $output_fh;
#!/usr/bin/perl
$|++;
open my $input_fh, '<', 'program1.out' or die "$!";
while( <$input_fh> ) { # safer line input operator
print s/\h+/ /gr;
}
You can skip the file and use a string instead, but at this point, you've gone beyond merely concatenating files and need a little coordination for them to share the scalar with the data. Still, the meat of the program doesn't care how you made those filehandles:
#!/usr/bin/perl
use v5.26;
my $output_string;
open my $output_fh, '>', \ $output_string or die "$!";
while( <<>> ) { # safer line input operator
print { $output_fh } tr/a-z/A-Z/r;
}
close $output_fh;
#!/usr/bin/perl
$|++;
open my $input_fh, '<', \ $output_string or die "$!";
while( <$input_fh> ) { # safer line input operator
print s/\h+/ /gr;
}
So let's go one step further and do what the shell was already doing for us.
#!/usr/bin/perl
use v5.26;
pipe my $input_fh, my $output_fh;
$output_fh->autoflush(1);
while( <<>> ) { # safer line input operator
print { $output_fh } tr/a-z/A-Z/r;
}
close $output_fh;
while( <$input_fh> ) { # safer line input operator
print s/\h+/ /gr;
}
From here, it gets a bit tricky and I'm not going to go to the next step with polling filehandles so one thing can write and the the next thing reads. There are plenty of things that do that for you. And, you're now doing a lot of work to avoid something that was already simple and working.
Instead of all that pipe nonsense, the next step is to separate code into functions (likely in a library), and deal with those chunks of code as named things that hide their details:
use Local::Util qw(remove_comments minify);
while( <<>> ) {
my $result = remove_comments($_);
$result = minify( $result );
...
}
That can get even fancier where you simply go through a series of steps without knowing what they are or how many of them there will be. And, since all the baby steps are separate and independent, you're basically back to the pipeline notion:
use Local::Util qw(get_input remove_comments minify);
my $result;
my #steps = qw(get_input remove_comments minify)
while( ! eof() ) { # or whatever
no strict 'refs'
$result = &{$_}( $result ) for #steps;
}
A better way makes that an object so you can skip the soft reference:
use Local::Processor;
my #steps = qw(get_input remove_comments minify);
my $processer = Local::Processor->new( #steps );
my $result;
while( ! eof() ) { # or whatever
$result = $processor->$_($result) for #steps;
}
Like I did before, the meat of the program doesn't care or know about the steps ahead of time. That means that you can move the sequence of steps to configuration and use the same program for any combination and sequence:
use Local::Config;
use Local::Processor;
my #steps = Local::Config->new->get_steps;
my $processer = Local::Processor->new;
my $result;
while( ! eof() ) { # or whatever
$result = $processor->$_($result) for #steps;
}
I write quite a bit about this sort of stuff in Mastering Perl and Effective Perl Programming. But, because you can do it doesn't mean you should. This reinvents a lot that make can already do for you. I don't do this sort of thing without good reason—bash and make have to be pretty annoying to motivate me to go this far.
The motivating problem was to generate a "cleaned" version of a LaTeX file, which would be easy to search, using regex, for complex phrases or sentences.
The following single Perl script does the job, whereas previously I required one shell script and two Perl scripts, entailing three invocations of Perl. This new, single script incorporates three consecutive loops, each with a different input record separator.
First loop:
input = STDIN, or a file passed as argument; record separator=default, loop by line; print result to fileafterperlLIN, a temporary
file on the hard drive.
Second loop:
input = fileafterperlLIN;
record separator = "", loop by paragraph;
print result to fileafterperlPRG, a temporary file on the hard drive.
Third loop:
input = fileafterperlPRG;
record separator = undef, slurp entire file
print result to STDOUT
This has the disadvantage of printing to and reading from two files on the hard drive, which may slow it down. Advantages are that the operation seems to require only one process; and all the code resides in a single file, which should make it easier to maintain.
#!/usr/bin/perl
# 2019v04v05vFriv17h18m41s
use strict;
use warnings;
use 5.18.2;
my $diagnose;
my $diagnosticstring;
my $exitcode;
my $userName = $ENV{'LOGNAME'};
my $scriptpath;
my $scriptname;
my $scriptdirectory;
my $cdld;
my $fileafterperlLIN;
my $fileafterperlPRG;
my $handlefileafterperlLIN;
my $handlefileafterperlPRG;
my $encoding;
my $count;
sub diagnosticmessage {
return unless ( $diagnose );
print STDERR "$scriptname: ";
foreach $diagnosticstring (#_) {
printf STDERR "$diagnosticstring\n";
}
}
# Routine setup
$scriptpath = $0;
$scriptname = $scriptpath;
$scriptname =~ s|.*\x2f([^\x2f]+)$|$1|;
$cdld = "$ENV{'cdld'}"; # A directory to hold temporary files used by scripts
$exitcode = system("test -d $cdld && test -w $cdld || { printf '%\n' 'cdld not a writeable directory'; exit 1; }");
die "$scriptname: system returned exitcode=$exitcode: bail\n" unless $exitcode == 0;
$scriptdirectory = "$cdld/$scriptname"; # To hold temporary files used by this script
$exitcode = system("test -d $scriptdirectory || mkdir $scriptdirectory");
die "$scriptname: system returned exitcode=$exitcode: bail\n" unless $exitcode == 0;
diagnosticmessage ( "scriptdirectory=$scriptdirectory" );
$exitcode = system("test -w $scriptdirectory && test -x $scriptdirectory || exit 1;");
die "$scriptname: system returned exitcode=$exitcode: $scriptdirectory not writeable or not executable. bail\n" unless $exitcode == 0;
$fileafterperlLIN = "$scriptdirectory/afterperlLIN.tex";
diagnosticmessage ( "fileafterperlLIN=$fileafterperlLIN" );
$exitcode = system("printf '' > $fileafterperlLIN;");
die "$scriptname: system returned exitcode=$exitcode: bail\n" unless $exitcode == 0;
$fileafterperlPRG = "$scriptdirectory/afterperlPRG.tex";
diagnosticmessage ( "fileafterperlPRG=$fileafterperlPRG" );
$exitcode=system("printf '' > $fileafterperlPRG;");
die "$scriptname: system returned exitcode=$exitcode: bail\n" unless $exitcode == 0;
# This script's job: starting with a LaTeX file, which may compile beautifully in pdflatex but be difficult
# to read visually or search automatically,
# (1) convert any line that looks blank --- a "trivial line", containing only whitespace --- to a pure newline. This is because
# (a) LaTeX interprets any whitespace line following a non-blank or "nontrivial" line as end of paragraph, whereas
# (b) Perl needs two consecutive newlines to signal end of paragraph.
# (2) remove all LaTeX comments;
# (3) deal with the \unskip LaTeX construct, etc.
# The result will be
# (4) each LaTeX paragraph will occupy a unique line
# (5) exactly one pair of newlines --- visually, one blank line --- will divide each pair of consecutive paragraphs
# (6) first paragraph will be on first line (no opening blank line) and last paragraph will be on last line (no ending blank line)
# (7) whitespace in output will consist of only
# (a) a single space between readable strings, or
# (b) double newline between paragraphs
#
$handlefileafterperlLIN = undef;
$handlefileafterperlPRG = undef;
$encoding = ":encoding(UTF-8)";
diagnosticmessage ( "fileafterperlLIN=$fileafterperlLIN" );
open($handlefileafterperlLIN, ">> $encoding", $fileafterperlLIN) || die "$0: can't open $fileafterperlLIN for appending: $!";
# Loop 1 / line:
# Default input record separator: loop through one line at a time, delimited by \n
$count = 0;
while (<>) {
$count = $count + 1;
diagnosticmessage ( "line $count" );
s/^\s*\n/\n/mg; # Convert any trivial line to a pure newline.
print $handlefileafterperlLIN $_;
}
close($handlefileafterperlLIN);
open($handlefileafterperlLIN, "< $encoding", $fileafterperlLIN) || die "$0: can't open $fileafterperlLIN for reading: $!";
open($handlefileafterperlPRG, ">> $encoding", $fileafterperlPRG) || die "$0: can't open $fileafterperlPRG for appending: $!";
# Loop PRG / paragraph:
local $/ = ""; # Input record separator: loop through one paragraph at a time. position marker $ comes only at end of paragraph.
$count = 0;
while (<$handlefileafterperlLIN>) {
$count = $count + 1;
diagnosticmessage ( "paragraph $count" );
s/(?<!\x5c)[\x25].*\n/ /g; # Remove all LaTeX comments.
# They start with % not \% and extend to end of line or newline character. Join to next line.
# s/(?<!\x5c)([\x24])/\x2a/g; # 2019v04v01vMonv13h44m09s any $ not preceded by backslash \, replace $ by * or something.
# This would be only if we are going to run detex on the output.
s/(.)\n/$1 /g; # Any line that has something other than newline, and then a newline, is joined to the subsequent line
s|([^\x2d])\s*(\x2d\x2d\x2d)([^\x2d])|$1 $2$3|g; # consistent treatment of triple hyphen as em dash
s|([^\x2d])(\x2d\x2d\x2d)\s*([^\x2d])|$1$2 $3|g; # consistent treatment of triple hyphen as em dash, continued
s/[\x0b\x09\x0c\x20]+/ /gm; # collapse each "run" of whitespace other than newline, to a single space.
s/\s*[\x5c]unskip(\x7b\x7d)?\s*(\S)/$2/g; # LaTeX whitespace-collapse across newlines
s/^\s*//; # Any nontrivial line: No indenting. No whitespace in first column.
print $handlefileafterperlPRG $_;
print $handlefileafterperlPRG "\n\n"; # make sure each paragraph ends with 2 newlines, hence at least 1 blank line.
}
close($handlefileafterperlPRG);
open($handlefileafterperlPRG, "< $encoding", $fileafterperlPRG) || die "$0: can't open $fileafterperlPRG for reading: $!";
# Loop slurp
local $/ = undef; # Input record separator: entire file is a single record.
$count = 0;
while (<$handlefileafterperlPRG>) {
$count = $count + 1;
diagnosticmessage ( "slurp $count" );
s/[\n][\n]+/\n\n/g; # Exactly 2 blank lines (newlines) separate paragraphs. Like cat -s
s/[\n]+$/\n/; # Last line is visible or "nontrivial"; no trivial (blank) line at the end
s/^[\n]+//; # No trivial (blank) line at the start. The first line is "nontrivial."
print STDOUT;
}

perl error: Use of uninitialized value $_ in concatenation (.) or string

I get the following error:
Use of uninitialized value $_ in concatenation (.) or string at checkfornewfiles.pl line 34.
when attempting to run the following code :
#!/usr/bin/perl -w
#Author: mimo
#Date 3/2015
#Purpose: monitor directory for new files...
AscertainStatus();
######### start of subroutine ########
sub AscertainStatus {
my $DIR= "test2";
####### open handler #############
opendir (HAN1, "$DIR") || die "Problem: $!";
########## assign theoutput of HAN1 to array1 ##########
my #array1= readdir(HAN1);
######## adding some logic #########
if ("$#array1" > 1) { #### if files exists (more than 1) in the directory #######
for (my $i=0; $i<2; $i++) {shift #array1;} ####### for i in position 0 (which is the . position) loop twice and add one (the position ..) get rid of them #######
MailNewFiles(#array1);
} else { print "No New Files\n";}
}
sub MailNewFiles {
$mail= "sendmail";
open ($mail, "| /usr/lib/sendmail -oi -t" ) ||die "errors with sendmail $!"; # open handler and pipe it to sendmail
print $mail <<"EOF"; #print till the end of fiEOF
From: "user";
To: "root";
Subject: "New Files Found";
foreach (#_) {print $mail "new file found:\n $_\n";}
EOF
close($mail);
}
#End
I am new to perl and I don't know what's going wrong. Can anyone help me ?
A few suggestions:
Perl isn't C. Your main program loop shouldn't be a declared subroutine which you then execute. Eliminate the AscertainStatus subroutine.
Always, always use strict; and use warnings;.
Indent correctly. It makes it much easier for people to read your code and help analyze what you did wrong.
Use a more modern Perl coding style. Perl is an old language, and over the years new coding style and techniques have been developed to help you eliminate basic errors and help others read your code.
Don't use system commands when there are Perl modules that can do this for you in a more standard way, and probably do better error checking. Perl comes with the Net::SMTP that handles mail communication for you. Use that.
The error Use of uninitialized value $_ in concatenation (.) or string is exactly what it says. You are attempting to use a value of a variable that hasn't been set. In this case, the variable is the #_ variable in your foreach statement. Your foreach isn't a true foreach, but part of your print statement since your EOF is after your for statement. This looks like an error.
Also, what is the value of #_? This variable contains a list of values that have been passed to your subroutine. If none are passed, it will be undefined. Even if #_ is undefined, foreach (undef) will simply skip the loop. However, since foreach (#_) { is a string to print, your Perl program will crash without #_ being defined.
If you remove the -w from #!/usr/bin/perl, your program will actually "work" (Note the quotes), and you'll see that your foreach will literally print.
I do not recommend you not to use warnings which is what -w does. In fact, I recommend you use warnings; rather than -w. However, in this case, it might help you see your error.
You have EOF after the line with foreach. It contains $_ which is interpolated here but $_ is not initialized yet because it is not in foreach loop. It is not code but just text. Move EOF before foreach.
But probably you would like
sub MailNewFiles {
$mail= "sendmail";
open ($mail, "| /usr/lib/sendmail -oi -t" ) ||die "errors with sendmail $!"; # open handler and pipe it to sendmail
local $"="\n"; # " make syntax highlight happy
print $mail <<"EOF"; #print till the end of fiEOF
From: "user";
To: "root";
Subject: "New Files Found";
New files found:
#_
EOF
close($mail);
}
See perlvar for more informations about $".
The message
Use of uninitialized value $xxx in ...
is very straightforward. When you encounter it, it means that you are using a variable ($xxx) in any way, but that the variable has not ever been initialized.
Sometimes, adding an initialization command at the start of you code is enough :
my $str = '';
my $num = 0;
Sometimes, your algorithm is wrong, or you just mistyped your variable, like in :
my $foo = 'foo';
my $bar = $ffo . 'bar'; # << There is a warning on this line
# << because you made a mistake on $foo ($ffo)

How to delete a pattern matching and the rest of the line in a file using Perl

I'm a new programmer in Perl and i would like to find a pattern in a file and delete it with the rest of the line. For example,
"input file"
>hsa-let-7a-5p MIMAT0000062 Homo sapiens let-7a-5p
UGAGGUAGUAGGUUGUAUAGUU
>hsa-let-7a-3p MIMAT0004481 Homo sapiens let-7a-3p
CUAUACAAUCUACUGUCUUUC
>hsa-let-7a-2-3p MIMAT0010195 Homo sapiens let-7a-2-3p
CUGUACAGCCUCCUAGCUUUCC
>hsa-let-7b-5p MIMAT0000063 Homo sapiens let-7b-5p
UGAGGUAGUAGGUUGUGUGGUU
"desired output file"
>hsa-let-7a-5p MIMAT0000062
UGAGGUAGUAGGUUGUAUAGUU
>hsa-let-7a-3p MIMAT0004481
CUAUACAAUCUACUGUCUUUC
>hsa-let-7a-2-3p MIMAT0010195
CUGUACAGCCUCCUAGCUUUCC
>hsa-let-7b-5p MIMAT0000063
UGAGGUAGUAGGUUGUGUGGUU
I want to find the string "Homo sapiens" and delete it as well as the rest of the line.
I write the following code but it is not functional
#!/usr/bin/perl
use strict;
use warnings;
my $find = "Homo sapiens"; #string for searching
open (FILE1, "input.fasta") || die "Cannot open the file!"; #open for reading
open (FILE2, ">>output.fasta") || die "Cannot open the file!"; #open for writing
while (my $line = <FILE1>){
if ($line =~ /$find/){
print FILE2 $line;
print FILE2 scalar <FILE1>;
}
}
close(FILE1);
close(FILE2);
exit;
Thanks
The majority of the Linux world has a fascination with one-line programs, so here is a one-line solution that does as you ask
perl -pe's/\s*Homo Sapiens.*//i' input.txt
It will make the changes that you describe and send the result to STDOUT.
If you want to write the altered text to a new file then simply redirect the output, with something like
perl -pe's/\s*Homo Sapiens.*//i' input.txt > fixed.txt
output
>hsa-let-7a-5p MIMAT0000062
UGAGGUAGUAGGUUGUAUAGUU
>hsa-let-7a-3p MIMAT0004481
CUAUACAAUCUACUGUCUUUC
>hsa-let-7a-2-3p MIMAT0010195
CUGUACAGCCUCCUAGCUUUCC
>hsa-let-7b-5p MIMAT0000063
UGAGGUAGUAGGUUGUGUGGUU
If you are not one of those people and you need help to write the equivalent Perl program then please ask.
Update
An equivalent program would look like this. I've called it sapiens.pl. You would run it from the command line with the input file as a parameter, such as
sapiens.pl input.txt > fixed.txt
#!/usr/bin/perl
use strict;
use warnings;
my $remove = 'Homo sapiens';
while (<>) {
s/\s*$remove.*//i;
print;
}
I would replace your while loop with the following.
while (<FILE1>){
s/$find.*//;
print FILE2 $line;
}
I loaded the line into the default variable by not specifically assigning it to any other variable, and then applied the substitution operator to it. I am substituting your variable find any any characters that come after it in the line with the empty string. We don't need to check if the substitution worked. If it did, then we removed the unwanted characters; If not, then we want the entire line.

How to replace ^M with a new line in perl

My test file has "n" number of lines and between each line there is a ^M, which in turn makes it one big string. The code I am working with opens said file and should parse out a header and then the subsequent rows, then searches for the Directory Path and File name. But because the file just ends up as a big string it doesn't work correctly
#!/usr/bin/perl
#use strict;
#use warnings;
open (DATA, "<file.txt") or die ("Unable to open file");
my $search_string = "Directory Path";
my $column_search = "Filename";
my $header = <DATA>;
my #header_titles = split /\t/, $header;
my $extract_col = 0;
my $col_search = 0;
for my $header_line (#header_titles) {
last if $header_line =~ m/$search_string/;
$extract_col++;
}
for my $header_line (#header_titles) {
last if $header_line =~m/$column_search/;
$col_search++;
}
print "Extracting column $extract_col $search_string\n";
while ( my $row = <DATA> ) {
last unless $row =~ /\S/;
chomp $row;
my #cells = split /\t/, $row;
$cells[74]=~s/:/\//g;
$cells[$extract_col]= $cells[74] . $cells[$col_search];
print "$cells[$extract_col] \n";
}
When i open the test file in VI i have used
:%s/^M/\r/g
and that removes the ^M's but how do i do it inside this perl program? When i tried a test program and inserted that s\^M/\r/g and had it write to a different file it came up as a lot of Chinese characters.
If mac2unix isn't working for you, you can write your own mac2unix as a Perl one-liner:
perl -pi -e 'tr/\r/\n/' file.txt
That will likely fail if the size of the file is larger than virtual memory though, as it reads the whole file into memory.
For completeness, let's also have a dos2unix:
perl -pi -e 'tr/\r//d' file.txt
and a unix2dos:
perl -pi -e 's/\n/\r\n/g' file.txt
Before you start reading the file, set $/ to "\r". This is set to the linefeed character by default, which is fine for UNIX-style line endings, and almost OK for DOS-style line endings, but useless for the old Mac-style line endings you are seeing. You can also try mac2unix on your input file if you have it installed.
For more, look for "INPUT_RECORD_SEPARATOR" in the perlvar manpage.
Did this file originate on a windows system? If so, try running the dos2unix command on the file before reading it. You can do this before invoking the perl script or inside the script before you read it.
You might want to set $\ (input record separator) to ^M in the beginning of your script, such as:
$\ = "^M";
perl -MExtUtils::Command -e dos2unix file