Operator <> issue - perl

I create a file named fred:
fred_1
fred_2
fred_3
I write a Perl script named lac.pl:
while(<>){
print reverse <>;
}
Then I execute the below command.
Why did the output miss fred_1, and the program not end, just waiting for my input?
If I update the Perl script like below, the result will be right:
print reverse <>;

You miss the first line, because that was read into $_ with your while(<>).
print reverse <>;
The <> is evaluated in list context. That is, all the remaining input is slurped into a list which is reversed and printed.
Now, the loop goes back to the conditional, waiting for another line on input. Try
print reverse <>;
by itself, not in a loop.
Of course, slurping like this will make the memory footprint of your program proportional to the size of its input which is not practical for large inputs.
If you do need to reverse large files, use File::ReadBackwards:
#!/usr/bin/env perl
use strict;
use warnings;
use File::ReadBackwards;
for my $arg (#ARGV) {
my $reader = File::ReadBackwards->new($arg)
or die "Cannot read '$arg': $!";
while (defined(my $line = $reader->readline)) {
print $line;
}
}

Why was it wrong? Because while(<>) reads a line into $_. And then you ignore the value. Then it waits for your input because you used <> again, after it had finished. So it guesses you want to read from input, because you've used up everything else available.

Related

Remove multiple duplicate lines from a file

I have a Perl script run in crontab that generates a file rich with duplicate entries, because on each run it rewrites information previously written.
I would use a sort -u of file, but, I would do it at the end of the Perl script file.
My list
10/10/2017 00:01:39:000;Sagitter
10/11/2017 00:00:01:002;Lupus
10/12/2017 00:03:14:109;Leon
10/12/2017 00:09:00:459;Sagitter
10/13/2017 01:11:03:009;Lupus
12/13/2017 04:29:00:609;Ariet
10/11/2017 00:00:01:002;Lupus
10/12/2017 00:03:14:109;Leon
...
My code
#!/usr/bin/perl
# Libraries
use strict;
use warnings 'all';
%lines = ();
# Remove duplicate
open( TMP_GL_OUTPUT, '>', $OUTPUT_FILE ) or die $!;
while ( <TMP_GL_OUTPUT> ) {
$lines{$_}++;
}
open( OUTFILE, '>', $TMPOUTPUT_FILE ) or die $!;
print OUTFILE keys %lines;
close( OUTFILE );
close( TMP_GL_OUTPUT );
Where am I going wrong? In shell it feels shorter than in Perl.
sort -u $TMPOUTPUT_FILE > $OUTPUT_FILE
As Suggested by ikegamy user, I've do as following:
move $OUTPUT_FILE, $TMPOUTPUT_FILE; # Copy file
run [ 'sort', '-u', '--', $TMPOUTPUT_FILE ], '>', $OUTPUT_FILE; # Remove duplicate
unlink $TMPOUTPUT_FILE;
I think you are asking why your Perl program is longer than your shell script.
First of all, your shell script does something completely different than your Perl program.
Your shell script executes a program, and stores its out in a file.
Your Perl program reads a file, manipulates the data it read, and stores the output in a file.
The Perl equivalent to
sort -u -- "$TMPOUTPUT_FILE" > "$OUTPUT_FILE"
is
use IPC::Run qw( run );
run [ 'sort', '-u', '--', $TMPOUTPUT_FILE ], '>', $OUTPUT_FILE;
(There are differences in error handling between these two.)
They're not that different in length.
This brings up the second difference. The shell specializes in executing programs, but Perl is a general purpose language. It would be surprising if it wasn't longer in Perl!
(Now try comparing the size of your Perl program to the source of sort...)
List::Util is a core module.
use List::Util 'uniq';
print for uniq <>
Your code looks almost OK.
My proposition is only to chomp each line, before you
save an element in the hash.
The reason is that e.g. the last line, not terminated
with a \n may look just the same as one of previous lines,
but without chomp the previous line would have contained
the terminating \n, whereas the last - not.
The resut is that both these lines will be different keys in the hash.
Compare my example program (working, presented below) with yours, there are
no other significant differences, apart from reading from __DATA__ and
writing to the console.
In my program, for demonstration purposes, I put 2 variants of printout,
one with key values (repetition counts) and another, printing just keys.
In your program leave only the second printout.
use strict; use warnings; use feature qw(say);
my %lines;
while(<DATA>) {
chomp;
$lines{$_}++;
}
while(my($key, $val) = each %lines) {
printf "%-32s / %d\n", $key, $val;
}
say '========';
foreach my $key (keys %lines) {
say $key;
}
__DATA__
10/10/2017 00:01:39:000;Sagitter
10/11/2017 00:00:01:002;Lupus
10/12/2017 00:03:14:109;Leon
10/12/2017 00:09:00:459;Sagitter
10/13/2017 01:11:03:009;Lupus
12/13/2017 04:29:00:609;Ariet
10/11/2017 00:00:01:002;Lupus
10/12/2017 00:03:14:109;Leon
Edit
Your code assigns no names to $OUTPUT_FILE and $TMPOUTPUT_FILE,
you even didn't declare these variables, but I assume, that in your actual
code you did it.
Another detail is that %lines should be preceded with my,
otherwise, as you put use strict; the compiler prints an error.
Edit 2
There is a quicker and shorter solution than yours.
Instead of writing lines to a hash and printing them as late as in
the second step, you can do it in a single loop:
Read the line.
Check whether the hash already contains a key equal to the line just read.
If not, then:
write the line to the hash, to block the printout, if just the
same line occured again,
print the line.
You can even write this program as a Perl one-liner:
perl -lne"print if !$lines{$_}++" input.txt
If you run the above command from the Windows cmd, it will print the output
to the console. If you use Linux, instead of double quotes, you can use apostrophes.
You may of course redirect the output to any file, adding > output.txt to
the above command.
The code is executed for each input line, chomped due to -l option.
If any other details concerning Perl one-liners are not known to you, search the web.

Perl, find a match and read next line in perl

I would like to use
myscript.pl targetfolder/*
to read some number from ASCII files.
myscript.pl
#list = <#ARGV>;
# Is the whole file or only 1st line is loaded?
foreach $file ( #list ) {
open (F, $file);
}
# is this correct to judge if there is still file to load?
while ( <F> ) {
match_replace()
}
sub match_replace {
# if I want to read the 5th line in downward, how to do that?
# if I would like to read multi lines in multi array[row],
# how to do that?
if ( /^\sName\s+/ ) {
$name = $1;
}
}
I would recommend a thorough read of perlintro - it will give you a lot of the information you need. Additional comments:
Always use strict and warnings. The first will enforce some good coding practices (like for example declaring variables), the second will inform you about potential mistakes. For example, one warning produced by the code you showed would be readline() on unopened filehandle F, giving you the hint that F is not open at that point (more on that below).
#list = <#ARGV>;: This is a bit tricky, I wouldn't recommend it - you're essentially using glob, and expanding targetfolder/* is something your shell should be doing, and if you're on Windows, I'd recommend Win32::Autoglob instead of doing it manually.
foreach ... { open ... }: You're not doing anything with the files once you've opened them - the loop to read from the files needs to be inside the foreach.
"Is the whole file or only 1st line is loaded?" open doesn't read anything from the file, it just opens it and provides a filehandle (which you've named F) that you then need to read from.
I'd strongly recommend you use the more modern three-argument form of open and check it for errors, as well as use lexical filehandles since their scope is not global, as in open my $fh, '<', $file or die "$file: $!";.
"is this correct to judge if there is still file to load?" Yes, while (<$filehandle>) is a good way to read a file line-by-line, and the loop will end when everything has been read from the file. You may want to use the more explicit form while (my $line = <$filehandle>), so that your variable has a name, instead of the default $_ variable - it does make the code a bit more verbose, but if you're just starting out that may be a good thing.
match_replace(): You're not passing any parameters to the sub. Even though this code might still "work", it's passing the current line to the sub through the global $_ variable, which is not a good practice because it will be confusing and error-prone once the script starts getting longer.
if (/^\sName\s+/){$name = $1;}: Since you've named the sub match_replace, I'm guessing you want to do a search-and-replace operation. In Perl, that's called s/search/replacement/, and you can read about it in perlrequick and perlretut. As for the code you've shown, you're using $1, but you don't have any "capture groups" ((...)) in your regular expression - you can read about that in those two links as well.
"if I want to read the 5th line in downward , how to do that ?" As always in Perl, There Is More Than One Way To Do It (TIMTOWTDI). One way is with the range operator .. - you can skip the first through fourth lines by saying next if 1..4; at the beginning of the while loop, this will test those line numbers against the special $. variable that keeps track of the most recently read line number.
"and if I would like to read multi lines in multi array[row], how to do that ?" One way is to use push to add the current line to the end of an array. Since keeping the lines of a file in an array can use up more memory, especially with large files, I'd strongly recommend making sure you think through the algorithm you want to use here. You haven't explained why you would want to keep things in an array, so I can't be more specific here.
So, having said all that, here's how I might have written that code. I've added some debugging code using Data::Dumper - it's always helpful to see the data that your script is working with.
#!/usr/bin/env perl
use warnings;
use strict;
use Data::Dumper; # for debugging
$Data::Dumper::Useqq=1;
for my $file (#ARGV) {
print Dumper($file); # debug
open my $fh, '<', $file or die "$file: $!";
while (my $line = <$fh>) {
next if 1..4;
chomp($line); # remove line ending
match_replace($line);
}
close $fh;
}
sub match_replace {
my ($line) = #_; # get argument(s) to sub
my $name;
if ( $line =~ /^\sName\s+(.*)$/ ) {
$name = $1;
}
print Data::Dumper->Dump([$line,$name],['line','name']); # debug
# ... do more here ...
}
The above code is explicitly looping over #ARGV and opening each file, and I did say above that more verbose code can be helpful in understanding what's going on. I just wanted to point out a nice feature of Perl, the "magic" <> operator (discussed in perlop under "I/O Operators"), which will automatically open the files in #ARGV and read lines from them. (There's just one small thing, if I want to use the $. variable and have it count the lines per file, I need to use the continue block I've shown below, this is explained in eof.) This would be a more "idiomatic" way of writing that first loop:
while (<>) { # reads line into $_
next if 1..4;
chomp; # automatically uses $_ variable
match_replace($_);
} continue { close ARGV if eof } # needed for $. (and range operator)

Perl substitute with regex

When I run this command over a Perl one liner, it picks up the the regular expression -
so that can't be bad.
more tagcommands | perl -nle 'print /(\d{8}_\d{9})/' | sort
12012011_000005769
12012011_000005772
12162011_000005792
12162011_000005792
But when I run this script over the command invocation below, it does not pick up the
regex.
#!/usr/bin/perl
use strict;
my $switch="12012011_000005777";
open (FILE, "more /home/shortcasper/work/tagcommands|");
my #array_old = (<FILE>) ;
my #array_new = #array_old ;
foreach my $line(#array_new) {
$line =~ s/\d{8}_\d{9}/$switch/g;
print $line;
sleep 1;
}
This is the data that I am feeding into the script
/CASPERBOT/START URL=simplefile:///data/tag/squirrels/squirrels /12012011_000005777N.dart.gz CASPER=SeqRashMessage
/CASPERBOT/ADDSERVER simplefile:///data/tag/squirrels/12012011_0000057770.dart.trans.gz
/CASPERRIP/newApp multistitch CASPER_BIN
/CASPER_BIN/START URLS=simplefile:///data/tag/squirrels /12012011_000005777R.rash.gz?exitOnEOF=false;binaryfile:///data/tag/squirrels/12162011_000005792D.binaryBlob.gz?exitOnEOF=false;simplefile:///data/tag/squirrels/12012011_000005777E.bean.trans.gz?exitOnEOF=false EXTRACTORS=rash;island;rash BINARY=T
You should study your one-liner to see how it works. First check perl -h to learn about the switches used:
-l[octal] enable line ending processing, specifies line terminator
-n assume "while (<>) { ... }" loop around program
The first one is not exactly self-explanatory, but what -l actually does is chomp each line, and then change $\ and $/ to newline. So, your one-liner:
perl -nle 'print /(\d{8}_\d{9})/'
Actually does this:
$\ = "\n";
while (<>) {
chomp;
print /(\d{8}_\d{9})/;
}
A very easy way to see this is to use the Deparse command:
$ perl -MO=Deparse -nle 'print /(\d{8}_\d{9})/'
BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined($_ = <ARGV>)) {
chomp $_;
print /(\d{8}_\d{9})/;
}
-e syntax OK
So, that's how you transform that into a working script.
I have no idea how you went from that to this:
use strict;
my $switch="12012011_000005777";
open (FILE, "more /home/shortcasper/work/tagcommands|");
my #array_old = (<FILE>) ;
my #array_new = #array_old ;
foreach my $line(#array_new) {
$line =~ s/\d{8}_\d{9}/$switch/g;
print $line;
sleep 1;
}
First of all, why are you opening a pipe from the more command to read a text file? That is like calling a tow truck to fetch you a cab. Just open the file. Or better yet, don't. Just use the diamond operator, like you did the first time.
You don't need to first copy the lines of a file to an array, and then use the array. while(<FILE>) is a simple way to do it.
In your one-liner, you print the regex. Well, you print the return value of the regex. In this script, you print $line. I'm not sure how you thought that would do the same thing.
Your regex here will remove all set of numbers and replace it with the ones in your script. Nothing else.
You may also be aware that sleep 1 will not do what you think. Try this one-liner, for example:
perl -we 'for (1 .. 10) { print "line $_\n"; sleep 1; }'
As you will notice, it will simply wait 10 seconds then print everything at once. That's because perl by default prints to the standard output buffer (in the shell!), and that buffer is not printed until it is full or flushed (when the perl execution ends). So, it's a perception problem. Everything works like it should, you just don't see it.
If you absolutely want to have a sleep statement in your script, you'll probably want to autoflush, e.g. STDOUT->autoflush(1);
However, why are you doing that? Is it so you will have time to read the numbers? If so, put that more statement at the end of your one-liner instead:
perl ...... | more
That will pipe the output into the more command, so you can read it at your own pace. Now, for your one-liner:
Always also use -w, unless you specifically want to avoid getting warnings (which basically you never should).
Your one-liner will only print the first match. If you want to print all the matches on a new line:
perl -wnle 'print for /(\d{8}_\d{9})/g'
If you want to print all the matches, but keep the ones from the same line on the same line:
perl -wnle 'print "#a" if #a = /(\d{8}_\d{9})/g'
Well, that should cover it.
Your open call may be failing (you should always check the result of an open to make sure it succeeded if the rest of the program depends on it) but I believe your problem is in complicating things by opening a pipe from a more command instead of simply opening the file itself. Change the open to simply
open FILE, "/home/shortcasper/work/tagcommands" or die $!;
and things should improve.

STDOUT to array Perl

I am compiling a Perl program, i am writing the output STDOUT to a file. In the same program , i want to run another small script using while function on the output of STDOUT. So, I need to save the output of first script in an array, then i can use in while<#array>. Like
open(File,"text.txt");
open(STDOUT,">output,txt");
#file_contents=<FILE>;
foreach (#file_contents){
//SCRIPT GOES HERE//
write;
}
format STDOUT =
VARIABLE #<<<<<< #<<<<<< #<<<<<<
$x $y $z
.
//Here I want to use output of above program in while loop //
while(<>){
}
How can i save the output of first program into array so that i can use in while loop, or how can i directly use STDOUT in while loop. I have to make sure that first part is completely executed. Thanks in advance.
Since you remapped STDOUT so it writes to a file, you could presumably close STDOUT, and then reopen the file for reading.
Quite where you're going to send any other output is a bit of a mystery, but presumably you can resolve that. Were it me, I'd not fiddle with STDOUT. I'd make the script write to a file handle:
use strict;
use warnings;
open my $input, "<", "text.txt" or die "A horrible death";
open my $output, ">", "output.txt" or die "A horrible death";
my #file_contents = <$input>;
close($input);
foreach (#file_contents)
{
# Script goes here
print $output "Any information that goes to output\n";
}
close $output;
open my $reread, "<", "output.txt" or die "A horrible death";
while (<$reread>)
{
# Process the previous output
}
Note the use of lexical file handles, the checking that the open worked, the close when finished with the input file, the use of use strict; and use warnings;. (I've only been working with Perl for 20 years and I know I don't trust my scripts until they run clean with those settings.)
I assume you want to reopen STDOUT in order to make the write function work. However, the correct solution for that is to either specify the file handle, or to a lesser extent, to use select.
write FILEHANDLE;
or
select FILEHANDLE;
write;
Unfortunately, it seems the IO of perlform is a bit arcane, and does not seem to allow for lexical file handles.
Your problem is you can't reuse the formatted text within the program, so a bit of trixy programming is required. What you can do is open a file handle that prints to a scalar. Which is another somewhat arcane perl functionality, but in this case, it might be the only way to do this directly.
# Using FOO as format to avoid destroying STDOUT
format FOO =
VARIABLE #<<<<<< #<<<<<< #<<<<<<
$x $y $z
.
my $foo;
use autodie; # save yourself some typing
open INPUT, '<', "text.txt"; # normally, we would add "or die $!" on these
open FOO, '>', \$foo; # but now autodie handles that for us
open my $output, '>', "output.txt";
while (<FILE>) {
$foo = ""; # we need to reset $foo each iteration
write FOO; # write to the file handle instead
print $output $foo; # this now prints $foo to output.txt
do_something($foo); # now you can also process the text at the same time
}
As you'll notice, we now first print the formatted line to the scalar $foo. While it is there, we can handle it as regular data, so there's no need to save to a file and reopening it to get to the data.
Each iteration, data is concatenated to the end of $foo, so to avoid accumulation, we need to reset $foo. The best way to handle this would be to make $foo lexical within the scope, but unfortunately we need $foo to be declared outside the while loop in order to be able to use it in the open statement.
It might be possible to use local $foo inside the while-loop, but I think that's adding yet more bad practice to this already very bad hack.
Conclusion:
With all this said and done, I suspect the best way to handle this is to not use perlform at all, and format your data in some other way. While perlform might be well suited to print to a file, it is not the best suited for what you have in mind. I recall this question from earlier, perhaps there was some other answer that would work better. Such as using sprintf, like Jonathan suggested
Assuming the output from your first program is tab-delimited:
while (<>) {
chomp $_;
my ($variable, $x, $y, $z) = split("\t", $_);
# do stuff with values
}

How do I get a filehandle from the command line?

I have a subroutine that takes a filehandle as an argument. How do I make a filehandle from a file path specified on the command line? I don't want to do any processing of this file myself, I just want to pass it off to this other subroutine, which returns an array of hashes with all the parsed data from the file.
Here's what the command line input I'm using looks like:
$ ./getfile.pl /path/to/some/file.csv
Here's what the beginning of the subroutine I'm calling looks like:
sub parse {
my $handle = shift;
my #data = <$handle>;
while (my $line = shift(#data)) {
# do stuff
}
}
Command line arguments are available in the predefined #ARGV array. You can get the file name from there and use open to open a filehandle to it. Assuming that you want read-only access to the file, you would do it this way:
my $file = shift #ARGV;
open(my $fh, '<', $file) or die "Can't read file '$file' [$!]\n";
parse($fh);
Note that the or die... checks the call open for success and dies with an error message if it wasn't. The built-in variable $! will contain the (OS dependent) error message on failure that tells you why the call wasn't successful. e.g. "Permission denied."
parse(*ARGV) is the simplest solution: the explanation is a bit long, but an important part of learning how to use Perl effectively is to learn Perl.
When you use a null filehandle (<>), it actually reads from the magical ARGV filehandle, which has special semantics: it reads from all the files named in #ARGV, or STDIN if #ARGV is empty.
From perldoc perlop:
The null filehandle <> is special: it can be used to emulate the
behavior of sed and awk. Input from <> comes either from standard
input, or from each file listed on the command line. Here’s how it
works: the first time <> is evaluated, the #ARGV array is checked, and
if it is empty, $ARGV[0] is set to "-", which when opened gives you
standard input. The #ARGV array is then processed as a list of
filenames. The loop
while (<>) {
... # code for each line
}
is equivalent to the following Perl-like pseudo code:
unshift(#ARGV, '-') unless #ARGV;
while ($ARGV = shift) {
open(ARGV, $ARGV);
while (<ARGV>) {
... # code for each line
}
}
except that it isn’t so cumbersome to say, and will actually work. It
really does shift the #ARGV array and put the current filename into the
$ARGV variable. It also uses filehandle ARGV internally--<> is just a
synonym for <ARGV>, which is magical. (The pseudo code above doesn’t
work because it treats <ARGV> as non-magical.)
You don't have to use <> in a while loop -- my $data = <> will read one line from the first non-empty file, my #data = <>; will slurp it all up at once, and you can pass *ARGV around as if it were a normal filehandle.
This is what the -n switch is for!
Take your parse method, and do this:
#!/usr/bin/perl -n
#do stuff
Each line is stored in $_. So you run
./getfile.pl /path/to.csv
And it does this.
See here and here for some more info about these. I like -p too, and have found the combo of -a and -F to be really useful.
Also, if you want to do some extra processing, add BEGIN and end blocks.
#!/usr/bin/perl -n
BEGIN {
my $accumulator;
}
# do stuff
END {
print process_total($accumulator);
}
or whatever. This is very, very useful.
Am I missing something or are you just looking for the open() call?
open($fh, "<$ARGV[0]") or die "couldn't open $ARGV[0]: $!";
do_something_with_fh($fh);
close($fh);