problem to read line after the first string match - perl

Not sure why this script isn't working. Can anyone suggest a fix in order to get the expected output?
Perl script:
open(INFILE,"test_input.txt")||die "can't open the file";
while(<INFILE>){
$_=~s/^\s+//;
$_=~s/\s+$//;
if ($_ =~ /work/){
<INFILE> or die "Bad file format";
my $model = INFILE;
print "line below search: $model\n";
if ($model =~/^good/){
print "found good word\n";
}
else{
print "no good word\n";
}
}
}
input file:
employment
work hard
good people
Expected Output:
line below search: good people
found good word
Actual Output:
line below search:
no good word

You were very close. To get your program running as you expected, I only had to make a couple of changes.
You have this code:
<INFILE> or die "Bad file format";
my $model = INFILE;
I think you're trying to ensure that there is actually another line after the one you've matched. But in the first of those lines, you just read the next line and throw the data away. And in the second of those lines, you set $model to the string "INLINE". Perhaps you meant my $model = <INFILE> - but even that doesn't give you what you want as you've already read (and discarded) the record you want in the previous line of code.
So I think you want to replace those two lines with a single line.
my $model = <INFILE> or die "Bad file format";
If you're interested, here's how I'd write this. Notice, I've got rid of the hard-coded file name. Instead, I read from the generic filehandle, <>. This is automatically connected to the file whose name is passed to the program on the command line. This has the advantage that the code is a) easier to write (as you don't need to explicitly open a file) and b) more flexible (as you can process a file with any name). I've also removed a few unnecessary uses of $_ =~.
#!/usr/bin/perl
# Always use these
use strict;
use warnings;
use feature 'say'; # for 'say()'
while (<>) {
s/^\s+//; # No need for $_ =~
s/\s+$//;
if (/work/) {
my $model = <> or die "Bad file format";
if ($model =~ /^good/) {
say 'found good word';
} else {
say 'no good word';
}
}
}

Try this.
Read file into array. Process line by line. You can the check the current and next line.
my $file = 'testFile.txt';
open my $fh, "<", "test_input.txt" or die "can't open the file";
chomp(my #data = <$fh>);
close $fh;
for my $index (0..$#data) {
if($data[$index] =~ /work/){
my $nextIndex = $index + 1;
die "Bad file format" unless(defined $data[$nextIndex]);
print "line below search: $data[$nextIndex]\n";
if ($data[$nextIndex] =~/^good/){
print "found good word\n";
}
else{
print "no good word\n";
}
}
}

Related

How to check for a string using Perl given

I'm trying to replace a particular line in a file. I can get my program to run, but it doesn't actually do the replacing that I want it to.
Here is my sample file:
test line 1
test line 2
line to be overwritten
test line 3
Here is the code that I have:
my $origFile = $file_path . "junk\.file";
my $newFile = $file_path . "junk\.file\.backup";
# system command to make a backup of the file
system "mv $origFile $newFile";
#opens the files
open( my $INFILE, $newFile ) || die "Unable to read $newFile\n";
open( my $OUTFILE, '>' . $origFile ) || die "Unable to create $origFile\n";
# While loop to read in the file line by line
while ( <$INFILE> ) {
given ($_) {
when ("line to be overwritten") {
print $OUTFILE "line has been overwritten\n";
}
default {
print $OUTFILE $_;
}
}
}
close($INFILE);
close($OUTFILE);
I've tried to change the when statements several different ways to no avail:
when ($_ eq "line to be overwritten")
when ($_ == "line to be overwritten")
when ($_ cmp "line to be overwritten")
But those only generate errors. Anyone know what I'm doing wrong here?
As highlighted in a comment on the original question, given/when is an experimental feature of perl. I would personally recommend using if/else in a loop, and then either use string equality or a regex to match the line(s) you want to replace. A quick example:
use strict;
use warnings;
while(my $line = <DATA>) {
if ( $line =~ /line to be overwritten/ ) {
print "Overwritten\n";
} else {
print $line;
}
}
__DATA__
test line 1
test line 2
line to be overwritten
test line 3
This will give the output:
test line 1
test line 2
Overwritten
test line 3
You could also use the string equality if you aren't confident in your regex, or the string is guaranteed to be the same:
...
if ($line eq 'line to be overwritten') {
...
Sidenotes
open
On your initial open, it is recommended to use the 3 argument version of open to save from unexpected issues:
open(my $INFILE, '<', $newFile) || die "Unable to read $newFile\n";
(for more info on this, see here: http://modernperlbooks.com/mt/2010/04/three-arg-open-migrating-to-modern-perl.html)
strict & warnings
Also, it is recommended to use strict and warnings in your code file, as seen in my example above - this will save you from accidental mistakes like trying to use a variable which has not been declared, and syntax errors which may give you head-scratching results!
Experimental Features
Experimental features in perl are where there is no guarantee made for backwards compatibility to be maintained when a new release of perl comes out. Obviously if you are using the same version of perl everywhere it should be compatible, but things may break if you update to another major version of perl. answered here as I dont have the reputation to answer in the comments...
You seem to be making it way more complicated than it needs to be - a simple regex to check each line and act accordingly should do the job.
while(<$INFILE>)
{
chomp($_);
if /^line to be overwritten$/ )
{
print $OUTFILE "line has been overwritten\n";
}
else
{
print $OUTFILE "$_\n";
}
}
One way to do it is to use Tie::File module. It allows to replace data right in the file. You can make the backup same way you are currently doing, before changing the original file.
use strict;
use warnings;
use Tie::File;
my $file = 'test.txt';
tie my #textFile, 'Tie::File', $file, recsep => "\n" or die $!;
s/line to be overwritten/line has been overwritten/ for #textFile;
untie #textFile;

Reading file line by line iteration issue

I have the following simple piece of code (identified as the problem piece of code and extracted from a much larger program).
Is it me or can you see an obvious error in this code that it stopping it from matching against $variable and printing $found when it definitely should be doing?
Nothing is printed when I try to print $variable, and there are definitely matching lines in the file I am using.
The code:
if (defined $var) {
open (MESSAGES, "<$messages") or die $!;
my $theText = $mech->content( format => 'text' );
print "$theText\n";
foreach my $variable (<MESSAGES>) {
chomp ($variable);
print "$variable\n";
if ($theText =~ m/$variable/) {
print "FOUND\n";
}
}
}
I have located this as the point at which the error is occurring but cannot understand why?
There may be something I am totally overlooking as its very late?
Update I have since realised that I misread your question and this probably doesn't solve the problem. However the points are valid so I am leaving them here.
You probably have regular expression metacharacters in $variable. The line
if ($theText =~ m/$variable/) { ... }
should be
if ($theText =~ m/\Q$variable/) { ... }
to escape any that there are.
But are you sure you don't just want eq?
In addition, you should read from the file using
while (my $variable = <MESSAGES>) { ... }
as a for loop will unnecessarily read the entire file into memory. And please use a better name than $variable.
This works for me.. Am I missing the question at hand? You're just trying to match "$theText" to anything on each line in the file right?
#!/usr/bin/perl
use warnings;
use strict;
my $fh;
my $filename = $ARGV[0] or die "$0 filename\n";
open $fh, "<", $filename;
my $match_text = "whatever";
my $matched = '';
# I would use a while loop, out of habit here
#while(my $line = <$fh>) {
foreach my $line (<$fh>) {
$matched =
$line =~ m/$match_text/ ? "Matched" : "Not matched";
print $matched . ": " . $line;
}
close $fh
./test.pl testfile
Not matched: this is some textfile
Matched: with a bunch of lines or whatever and
Not matched: whatnot....
Edit: Ah, I see.. Why don't you try printing before and after the "chomp()" and see what you get? That shouldn't be the issue, but it doesn't hurt to test each case..

Validate perl input - filter out inexistent files

I have a perl script to which i supply input(text file) from batch or sometimes from command prompt. When i supply input from batch file sometimes the file may not exisits. I want to catch the No such file exists error and do some other task when this error is thrown. Please find the below sample code.
while(<>) //here it throws an error when file doesn't exists.
{
#parse the file.
}
#if error is thrown i want to handle that error and do some other task.
Filter #ARGV before you use <>:
#ARGV = grep {-e $_} #ARGV;
if(scalar(#ARGV)==0) die('no files');
# now carry on, if we've got here there is something to do with files that exist
while(<>) {
#...
}
<> reads from the files listed in #ARGV, so if we filter that before it gets there, it won't try to read non-existant files. I've added the check for the size of #ARGV because if you supply a list files which are all absent, it will wait on stdin (the flipside of using <>). This assumes that you don't want to do that.
However, if you don't want to read from stdin, <> is probably a bad choice; you might as well step through the list of files in #ARGV. If you do want the option of reading from stdin, then you need to know which mode you're in:
$have_files = scalar(#ARGV);
#ARGV = grep {-e $_} #ARGV;
if($have_files && scalar(grep {defined $_} #ARGV)==0) die('no files');
# now carry on, if we've got here there is something to do;
# have files that exist or expecting stdin
while(<>) {
#...
}
The diamond operator <> means:
Look at the names in #ARGV and treat them as files you want to open.
Just loop through all of them, as if they were one big file.
Actually, Perl uses the ARGV filehandle for this purpose
If no command line arguments are given, use STDIN instead.
So if a file doesn't exist, Perl gives you an error message (Can't open nonexistant_file: ...) and continues with the next file. This is what you usually want. If this is not the case, just do it manually. Stolen from the perlop page:
unshift(#ARGV, '-') unless #ARGV;
FILE: while ($ARGV = shift) {
open(ARGV, $ARGV);
LINE: while (<ARGV>) {
... # code for each line
}
}
The open function returns a false value when a problem is encountered. So always invoke open like
open my $filehandle "<", $filename or die "Can't open $filename: $!";
The $! contains a reason for the failure. Instead of dieing, we can do some other error recovery:
use feature qw(say);
#ARGV or #ARGV = "-"; # the - symbolizes STDIN
FILE: while (my $filename = shift #ARGV) {
my $filehandle;
unless (open $filehandle, "<", $filename) {
say qq(Oh dear, I can't open "$filename". What do you wan't me to do?);
my $tries = 5;
do {
say qq(Type "q" to quit, or "n" for the next file);
my $response = <STDIN>;
exit if $response =~ /^q/i;
next FILE if $response =~ /^n/i;
say "I have no idea what that meant.";
} while --$tries;
say "I give up" and exit!!1;
}
LINE: while (my $line = <$filehandle>) {
# do something with $line
}
}

Comparing lines in a file with perl

Ive been trying to compare lines between two files and matching lines that are the same.
For some reason the code below only ever goes through the first line of 'text1.txt' and prints the 'if' statement regardless of if the two variables match or not.
Thanks
use strict;
open( <FILE1>, "<text1.txt" );
open( <FILE2>, "<text2.txt" );
foreach my $first_file (<FILE1>) {
foreach my $second_file (<FILE2>) {
if ( $second_file == $first_file ) {
print "Got a match - $second_file + $first_file";
}
}
}
close(FILE1);
close(FILE2);
If you compare strings, use the eq operator. "==" compares arguments numerically.
Here is a way to do the job if your files aren't too large.
#!/usr/bin/perl
use Modern::Perl;
use File::Slurp qw(slurp);
use Array::Utils qw(:all);
use Data::Dumper;
# read entire files into arrays
my #file1 = slurp('file1');
my #file2 = slurp('file2');
# get the common lines from the 2 files
my #intersect = intersect(#file1, #file2);
say Dumper \#intersect;
A better and faster (but less memory efficient) approach would be to read one file into a hash, and then search for lines in the hash table. This way you go over each file only once.
# This will find matching lines in two files,
# print the matching line and it's line number in each file.
use strict;
open (FILE1, "<text1.txt") or die "can't open file text1.txt\n";
my %file_1_hash;
my $line;
my $line_counter = 0;
#read the 1st file into a hash
while ($line=<FILE1>){
chomp ($line); #-only if you want to get rid of 'endl' sign
$line_counter++;
if (!($line =~ m/^\s*$/)){
$file_1_hash{$line}=$line_counter;
}
}
close (FILE1);
#read and compare the second file
open (FILE2,"<text2.txt") or die "can't open file text2.txt\n";
$line_counter = 0;
while ($line=<FILE2>){
$line_counter++;
chomp ($line);
if (defined $file_1_hash{$line}){
print "Got a match: \"$line\"
in line #$line_counter in text2.txt and line #$file_1_hash{$line} at text1.txt\n";
}
}
close (FILE2);
You must re-open or reset the pointer of file 2. Move the open and close commands to within the loop.
A more efficient way of doing this, depending on file and line sizes, would be to only loop through the files once and save each line that occurs in file 1 in a hash. Then check if the line was there for each line in file 2.
If you want the number of lines,
my $count=`grep -f [FILE1PATH] -c [FILE2PATH]`;
If you want the matching lines,
my #lines=`grep -f [FILE1PATH] [FILE2PATH]`;
If you want the lines which do not match,
my #lines = `grep -f [FILE1PATH] -v [FILE2PATH]`;
This is a script I wrote that tries to see if two file are identical, although it could easily by modified by playing with the code and switching it to eq. As Tim suggested, using a hash would probably be more effective, although you couldn't ensure the files were being compared in the order they were inserted without using a CPAN module (and as you can see, this method should really use two loops, but it was sufficient for my purposes). This isn't exactly the greatest script ever, but it may give you somewhere to start.
use warnings;
open (FILE, "orig.txt") or die "Unable to open first file.\n";
#data1 = ;
close(FILE);
open (FILE, "2.txt") or die "Unable to open second file.\n";
#data2 = ;
close(FILE);
for($i = 0; $i < #data1; $i++){
$data1[$i] =~ s/\s+$//;
$data2[$i] =~ s/\s+$//;
if ($data1[$i] ne $data2[$i]){
print "Failure to match at line ". ($i + 1) . "\n";
print $data1[$i];
print "Doesn't match:\n";
print $data2[$i];
print "\nProgram Aborted!\n";
exit;
}
}
print "\nThe files are identical. \n";
Taking the code you posted, and transforming it into actual Perl code, this is what I came up with.
use strict;
use warnings;
use autodie;
open my $fh1, '<', 'text1.txt';
open my $fh2, '<', 'text2.txt';
while(
defined( my $line1 = <$fh1> )
and
defined( my $line2 = <$fh2> )
){
chomp $line1;
chomp $line2;
if( $line1 eq $line2 ){
print "Got a match - $line1\n";
}else{
print "Lines don't match $line1 $line2"
}
}
close $fh1;
close $fh2;
Now what you may really want is a diff of the two files, which is best left to Text::Diff.
use strict;
use warnings;
use Text::Diff;
print diff 'text1.txt', 'text2.txt';

Why does my Perl script keep reading from same file, even though I closed it?

I'm writing this Perl script that gets two command line arguments: a directory and a year. In this directory is a ton of text files or html files(depending on the year). Lets say for instance it's the year 2010 which contains files that look like this <number>rank.html with the number ranging from 2001 to 2212. I want it to open each file individually and take a part of the title in the html file and print it to a text file. However, when I run my code it just prints the first files title to the text file. It seems that it only ever opens the first file 2001rank.html and no others. I'll post the code below and thanks to anyone that helps.
my $directory = shift or "Must supply directory\n";
my $year = shift or "Must supply year\n";
unless (-d $directory) {
die "Error: Directory must be a directory\n";
}
unless ($directory =~ m/\/$/) {
$directory = "$directory/";
}
open COLUMNS, "> columns$year.txt" or die "Can't open columns file";
my $column_name;
for (my $i = 2001; $i <= 2212; $i++) {
if ($year >= 2009) {
my $html_file = $directory.$i."rank.html";
open FILE, $html_file;
#check if opened correctly, if not, skip it
unless (defined fileno(FILE)) {
print "skipping $html_file\n";
next;
}
$/ = "\n";
my $line = <FILE>;
if (defined $line) {
$column_name = "";
$_ = <FILE> until m{</title>};
$_ =~ m{<title>CIA - The World Factbook -- Country Comparison :: (.+)</title>}i;
$column_name = $1;
}
else {
close FILE;
next;
}
close FILE;
}
else {
my $text_file = $directory.$i."rank.txt";
open FILE, $text_file;
unless (defined fileno(FILE)) {
print "skipping $text_file\n";
next;
}
$/ = "\r";
my $line = <FILE>;
if (defined $line) {
$column_name = "";
$_ = <FILE> until /Rank/i;
$_ =~ /Rank(\s+)Country(\s+)(.+)(\s+)Date/i;
$column_name = $3;
}
else {
close FILE;
next;
}
close FILE;
}
print "Adding $column_name to text file\n";
print COLUMNS "$column_name\n";
}
close COLUMNS;
In other words $column_name gets set equal to the same thing every pass in the loop, even though I know the html files are different.
You'll probably be able to debug this a lot faster if you convert using local lexicals for your filehandles instead of globals, as well as turn on strict checking:
use strict;
use warnings;
while (...)
{
# ...
open my $filehandle, $html_file;
# ...
my $line = <$filehandle>;
}
This way, the filehandle(s) will go out of scope during each loop iteration, so you can more clearly see what exactly is being referenced and where. (Hint: you may have missed a condition where the filehandle gets closed, so it is improperly reused the next time around.)
For more on best practices with open and filehandles, see:
Why is three-argument open calls with autovivified filehandles a Perl best practice?
What's the best way to open and read a file in Perl?
Some other points:
Don't ever explicitly assign to $_, that's asking for trouble. Declare your own variable to hold your data: my $line = <$filehandle> (as in the example above)
Pull out your matches directly into variables, rather than using $1, $2 etc, and only use parentheses for the portions you actually need: my ($column_name) = ($line =~ m/Rank\s+Country\s+.+(\s+)Date/i);
put the error conditions first, so the bulk of your code can be outdented one (or more) level(s). This will improve readability, as when the bulk of your algorithm is visible on the screen at once, you can better visualize what it is doing and catch errors.
If you apply the points above I'm pretty sure that you'll spot your error. I spotted it while making this last edit, but I think you'll learn more if you discover it yourself. (I'm not trying to be snooty; trust me on this!)
Your processing is similar for HTML and text files, so make your life easy and factor out the common part:
sub scrape {
my($path,$pattern,$sep) = #_;
unless (open FILE, $path) {
warn "$0: skipping $path: $!\n";
return;
}
local $/ = $sep;
my $column_name;
while (<FILE>) {
next unless /$pattern/;
$column_name = $1;
last;
}
close FILE;
($path,$column_name);
}
Then make it specific for the two types of input:
sub scrape_html {
my($directory,$i) = #_;
scrape $directory.$i."rank.html",
qr{<title>CIA - The World Factbook -- Country Comparison :: (.+)</title>}i,
"\n";
}
sub scrape_txt {
my($directory,$i) = #_;
scrape $directory.$i."rank.txt",
qr/Rank\s+Country\s+(.+)\s+Date/i,
"\r";
}
Then your main program is straightforward:
my $directory = shift or die "$0: must supply directory\n";
my $year = shift or die "$0: must supply year\n";
die "$0: $directory is not a directory\n"
unless -d $directory;
# add trailing slash if necessary
$directory =~ s{([^/])$}{$1/};
my $columns_file = "columns$year.txt";
open COLUMNS, ">", $columns_file
or die "$0: open $columns_file: $!";
for (my $i = 2001; $i <= 2212; $i++) {
my $process = $year >= 2009 ? \&scrape_html : \&scrape_txt;
my($path,$column_name) = $process->($directory,$i);
next unless defined $path;
if (defined $column_name) {
print "$0: Adding $column_name to text file\n";
print COLUMNS "$column_name\n";
}
else {
warn "$0: no column name in $path\n";
}
}
close COLUMNS or warn "$0: close $columns_file: $!\n";
Note how careful you have to be to close global filehandles. Please use lexical filehandles as in
open my $fh, $path or die "$0: open $path: $!";
Passing $fh as a parameter or stuffing it in hashes is much nicer. Also, lexical filehandles close automatically when they go out of scope. There's no chance of stomping on a handle someone else is already using.
Have you considered grep?
grep out just the line from the HTML containing the title, and then process the output of grep.
Simpler, as you would not have to write any file-handling code. You didn't say what you want with that title - if you only need a list, you might not need to write any code at all.
Try something like:
grep -ri title <directoryname>