pattern matching in perl including files which have signs in their names - perl

Supposed to have an array and I want to find corresponding file in the current directory. I have a problem with pattern matching with those file which have a sign in their names but for the rest everything is OK!
files:
========
A+B-C_www.txt
A-B_CC.books.#1.txt
#!/usr/perl/bin
my #a = qw(A+B-C_www A-B_CC.books.#1);
my $dir = ".";
opendir(DIR, $dir);
my #files = grep(/txt$/,readdir(DIR));
closedir(DIR);
foreach my $file (#files){
for (my $i=0, $i<=$#a, $i++)
if ($file =~ m/$a[$i]){
do some stuff ....}
else{
do some stuff .... }
}
}

This is because regular expressions have special characters with special meanings.
You can fix this with the \Q and \E regex modifiers.
if ($file =~ m/\Q$a[$i]\E/ ) {
Also - turn on use strict; and use warnings;. I assume that the broken regex in your original code (no trailing /) is a typo.
I'd also suggest instead of readdir and grep you could do both with:
while ( my $file = glob ( "*.txt" ) ) {
Also: Your pattern match is a substring match. You may need text anchors if that's not what you intend.

Irrelevant of the pattern matching issues that have already been addressed by the above answers, you could write you code more idiomatically with a grep statement instead of an inner for loop:
while ( my $file = glob('*.txt') ) {
my $has_match = grep { $file =~ m|\Q$_\E| } #a;
if ( $has_match ) {
# do something
}
else {
# do something
}
}

Use quotemeta to escape your pattern before interpolation.
my $pattern = quotemeta($a[$i]);
if ($file =~ m/$pattern/) {
}

Related

Perl Programming

I have these questions. But I don't know how to prove it or if I'm right. Are my answers right?
Find all complete lines of a file which contain only a row of any number of the letter x
x*
^x+$
^x*$ <-This one
^xxxxx$
Find all complete lines of a file which contain a row consisting only the letter x but ignoring any leading or trailing space on the line.
^\s* x+\s*$ <--This one
^\s(x*)\s$
\s* x+\s*
^\s+x+\s+$
I tried to use this
use strict;
use warnings;
my $filename = 'data.txt';
open( my $fh, '<:encoding(UTF-8)', $filename ) or die "Could not open file '$filename' $!";
while ( my $row = <$fh> ) {
chomp $row;
print "$row\n";
}
I tried this code but I got error at (^
use strict;
use warnings;
my $filename = 'data.txt';
open( my $fh, '<:encoding(UTF-8)', $filename ) or die "Could not open file '$filename' $!";
while ( my $row = <$fh> ) {
if ( ^x*$ ) {
print "This is";
}
}
You're talking about regular expressions and how to use them in Perl. Your question seems to be whether the answers you picked to homework are correct.
The code you've added should do what you want, but it has syntax errors.
if ( ^x*$ ) {
print "This is";
}
Your pattern is correct, but you don't know how to use a regular expression in Perl. You're missing the actual operator to tell Perl that you want a regular expression.
The short form is this, where I've highlighted the important part with #
if ( /^x*$/ ) {
# #
The slashes // tell Perl that it should match a pattern. The long form of it is:
if ( $_ =~ m/^x*$/ ) {
## ## ## #
$_ is the variable that you are matching against a pattern. The =~ is the matching operator. The m// constructs a pattern to match with. If you use // you can leave out the m, but it's clearer to put it in.
The $_ is called topic. It's like a default variable that stuff goes into in Perl if you don't specify another variable.
while ( <$fh> ) {
print $_ if $_ =~ m/foo/; # print all lines that contain foo
}
This code can be written as $_, because a lot of commands in Perl assume that you mean $_ when you don't explicitly name a variable.
while ( <$fh> ) { # puts each line in $_
print if m/foo/; # prints $_ if $_ contains foo
}
You code looks like you wanted to do that, but in fact you have a $row in your loop. That's good, because it is more explicit. That means it's easier to read. So what you need to do for your match is:
while ( my $row = <$fh> ) {
if ( $row =~ m/^x*$/ ) {
print "This is";
}
}
Now you will iterate each line of the file behind the $fh filehandle, and check if it matches the pattern ^x*$. If it does, you print _"This is". That doesn't sound very useful.
Consider this example, where I am using the __DATA__ section instead of a file.
use strict;
use warnings;
while ( my $row = <DATA> ) {
if ( $row =~ m/^x*$/ ) {
print "This is";
}
}
__DATA__
foo
xxx
x
xxxxx
bar
This will print:
This isThis isThis isThis is
It really does not seem to be very useful. It would make more sense to include the line that matched.
if ( $row =~ m/^x*$/ ) {
print "match: $row";
}
Now we get this:
match: xxx
match:
match: x
match: xxxxx
That's almost what we expected. It matches a single x, and a bunch of xs. It did not match foo or bar. But it does match an empty line.
That's because you picked the wrong pattern.
The * multiplier means match as many as possible, as least none.
The + multiplier means match as many as possible, at least one.
So your pattern should be the one with +, or it will match if there is nothing, because start of the line, no x, end of the line matches an empty line.
While you're at it, you could also rename your variable. Unless you're dealing with CSV, which has rows of data, you have lines, not rows. So $line would be a better name for your variable. Giving variables good, descriptive names is very important because it makes it easier to understand your program.
use strict;
use warnings;
my $filename = 'data.txt';
open( my $fh, '<:encoding(UTF-8)', $filename )
or die "Could not open file '$filename' $!";
while ( my $line = <$fh> ) {
if ( $line =~ m/^x+$/ ) {
print "match: $line";
}
}

perl overload file name download

I need to be able to propose files to be downloaded but i have to read and print the file in my CGI. I tried to go for :
#!/usr/bin/perl -w
use strict;
push( #INC, $lib_directory );
require 'lib_utils.pl';
dl_file('/tmp/final.pdf');
as main page (dl.pl) and
sub dl_file {
my ($file) = #_;
if ( ! -e $file) {
print "file does not exist";
return 0;
}
my $content = read_file( $file, binmode => ':utf8' ) ;
$file =~ m#(.*)([^/]*)$#;
my $directory = $1;
my $filename = $2;
chdir $directory;
my $form = new CGI;
print $form->header(
-type => 'application/octet-stream',
-attachment => $filename,
-filename => $filename,
-Content-Disposition => "attachment; filename=$filename",
);
$form->print($content);
return 1;
}
for the called function. Funny thing is, this code workes just fine if i dont go for a sub and have all the code in dl.pl BUT as soon as i move the code in a sub, the downloaded file is called after the script (ie dl.pl)
How would you change it or how would you do ?
Thanks in advance for your help
Your line
$file =~ m#(.*)([^/]*)$#
will leave $1 containing the whole of $file and $2 empty. You need a slash in there somewhere, probably like this
$file =~ m#(.*)/([^/]*)$#
It would also make sense to make the directory optional, like so
$file =~ m#(?:(.*)/)?([^/]*)$#
my $directory = $1;
and you would have to write
chdir $directory if $directory
This is what's tripping you up:
$file =~ m#(.*)([^/]*)$#;
Looks like you're trying to split "/tmp/final.pdf" into directory and file. But you don't - that pattern splits you into:
print "F:",$filename,"\n";
print "D:",$directory,"\n";
this output:
F:
D:/tmp/final.pdf
This is why you have the problem - you don't have a filename, so it defaults to using the script name.
I would suggest instead you want:
my ( $directory, $filename ) = ( $file =~ m,(.*/)([\.\w]+)$, );
This gives:
F:final.pdf
D:/tmp/
As has been said, you're suffering from the greedy matching of .* which will eat up the entire string:
$file =~ m{(.*)([^/]*)$};
There are three easy solutions to this
1. Boundary Conditions
As has been stated, you can add a boundary condition that limits how much .* can match:
$file =~ m{(?:(.*)/)?([^/]*)$};
my $dir = $1 // '';
my $filename = $2;
Or this somewhat convoluted lookbehind assertion can also enforce a boundary:
$file =~ m{(.*)(?<![^/])([^/]*)$};
my $dir = $1;
my $filename = $2;
2. Non-greedy matching
However, the simplest regex solution is to use non-greedy matching .*?:
$file =~ m{(.*?)([^/]*)$};
my ($dir, $filename) = ($1, $2);
Basically, anytime you're about to put .* anywhere, check your assumptions. The majority of the time you'll actually want .*? instead.
3. Module for parsing file paths
The bonus option is just to use a module like File::Spec parsing file path information
use File::Spec;
my ($vol, $dirs, $filename) = File::Spec->splitpath( $file );

Unmatched ) in reg when using lc function

I am trying to run the following code:
$lines = "Enjoyable )) DAY";
$lines =~ lc $lines;
print $lines;
It fails on the second line where I get the error mentioned in the title. I understand the brackets are causing the trouble. I think I could use "quotemeta", but the thing is that my string contains info that I go on to process later, so I would like to keep the string intact as far as possible and not tamper with it too much.
You have two problems here.
1. =~ is used to execute a specific set of operations
The =~ operator is used to either match with //, m//, qr// or a string; or to substitute with s/// or tr///.
If all you want to do is lowercase the contents of $lines then you should use = not =~.
$lines = "Enjoyable )) DAY";
$lines = lc $lines;
print $lines;
2. Regular expressions have special characters which must be escaped
If you want to match $lines against a lower case version of $Lines, which should return true if $lines was already entirely lower case and false otherwise, then you need to escape the ")" characters.
#!/usr/bin/env perl
use strict;
use warnings;
my $lines = "enjoyable )) day";
if ($lines =~ lc quotemeta $lines) {
print "lines is lower case\n";
}
print $lines;
Note this is a toy example trying to find a reason for doing $lines =~ lc $lines - It would be much better (faster, safer) to solve this with eq as in $lines eq lc $lines.
See perldoc -f quotemeta or http://perldoc.perl.org/functions/quotemeta.html for more details on quotemeta.
=~ is used for regular expressions. "lc" is not part of regex, it's a function like this: $new = lc($old);
I don't recall the regex operator for lowercase, because I use lc() all the time.

Perl - Use of uninitialized value in string

I started teaching myself Perl, and with the help of some Googling, I was able to throw together a script that would print out the file extensions in a given directory. The code works well, however, it will sometimes complain the following:
Use of uninitialized value $exts[xx] in string eq at get_file_exts.plx
I tried to correct this by initializing my array as follows: my #exts = (); but this did not work as expected.
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
#Check for correct number of arguments
if(#ARGV != 1) {
print "ERROR: Incorrect syntax...\n";
print "Usage: perl get_file_exts.plx <Directory>\n";
exit 0;
}
#Search through directory
find({ wanted => \&process_file, no_chdir => 1 }, #ARGV);
my #exts;
sub process_file {
if (-f $_) {
#print "File: $_\n";
#Get extension
my ($ext) = $_ =~ /(\.[^.]+)$/;
#Add first extension
if(scalar #exts == 0) {
push(#exts, $ext);
}
#Loop through array
foreach my $index (0..$#exts) {
#Check for match
if($exts[$index] eq $ext) {
last;
}
if($index == $#exts) {
push(#exts, $ext);
}
}
} else {
#print "Searching $_\n";
}
}
#Sort array
#exts = sort(#exts);
#Print contents
print ("#exts", "\n");
You need to test if you found an extension.
Also, you should not be indexing your array. You also do not need to manage 'push' just do it. It is not the Perl way. Your for loop should start like this:
sub process_file {
if (-f $_) {
#print "File: $_\n";
#Get extension
my ($ext) = $_ =~ /(\.[^.]+)$/;
# If we found an extension, and we have not seen it before, add it to #exts
if ($ext) {
#Loop through array to see if this is a new extension
my $newExt = 1;
for my $seenExt (#exts) {
#Check for match
if ($seenExt eq $ext) {
$newExt = 0
last;
}
}
if ($newExt) {
push #exts,$ext;
}
}
}
}
But what you really want to do is to use a hash table to record if you saw an extension
# Move this before find(...); if you want to initialize it or you will clobber the
# contents
my %sawExt;
sub process_file {
if (-f $_) {
#print "File: $_\n";
# Get extension
my ($ext) = $_ =~ /(\.[^.]+)$/;
# If we have an extension, mark that we've seen it
$sawExt{$ext} = 1
if $ext;
}
}
# Print the extensions we've seen in sorted order
print join(' ',sort keys %sawExt) . "\n";
Or even
sub process_file {
if (-f $_ && $_ =~ /(\.[^.]+)$/) {
$sawExt{$1} = 1;
}
}
Or
sub process_file {
$sawExt{$1} = 1
if -f && /(\.[^.]+)$/;
}
Once you start thinking in Perl this is the natural way to write it
The warning is complaining about a content of $exts[xx], not #exts itself.
Actually $ext can be undef, when the filename doesn't match to your regexp, for instance README.
Try like:
my ($ext) = $_ =~ /(\.[^.]+)$/ or return;
The main problem is that you aren't accounting for file names that don't contain a dot, so
my ($ext) = $_ =~ /(\.[^.]+)$/;
sets $ext to undef.
Despite the warning, processing continues by evaluating undef as the null string, failing to find that in #exts, and so percolating undef to the array as well.
The minimal change to get your code working is to replace
my ($ext) = $_ =~ /(\.[^.]+)$/;
with
return unless /(\.[^.]+)$/;
my $ext = $1;
But there is a couple of Perl lessons to be learned here. It used to be taught that good programs were well-commented programs. That was in the days of having to write efficient but incomprehensible code, but is no longer true. You should write code that is as clear as possible, and add comments only if you absolutely have to write something that isn't self-explanatory.
You should remember and use Perl idioms, and try to forget most C that you knew. For instance, Perl accepts the "here document" syntax, and it is common practice to use or and and as short-circuit operators. Your parameter check becomes
#ARGV or die <<END;
ERROR: Incorrect syntax...
Usage: perl get_file_exts.plx <Directory>
END
Perl allows for clear but concise programming. This is how I would have written your wanted subroutine
sub process_file {
return unless -f and /(\.[^.]+)$/;
my $ext = $1;
foreach my $index (0 .. $#exts) {
return if $exts[$index] eq $ext;
}
push #exts, $ext;
}
Use exists on $exts[xx] before accessing it.
exists is deprecated though as #chrsblck pointed out :
Be aware that calling exists on array values is deprecated and likely
to be removed in a future version of Perl.
But you should be able to check if it exists (and not 0 or "") simply with :
if($exts[index] && $exts[$index] eq $ext){
...
}

Perl search is only showing last result

I have two arrays, one with search terms and another which is multiple lines fetched from a file. I have a nested foreach statement and am searching for for all combinations, but only the very last match is showing even though I know for a fact that there are many other matches!! I have tried many different versions of the code but here is my last one:
open (MYFILE, 'searchTerms.txt');
open (MYFILE2, 'fileToSearchIn.xml');
#searchTerms = <MYFILE>;
#xml = <MYFILE2>;
close(MYFILE2);
close(MYFILE);
$results = "";
foreach $searchIn (#xml)
{
foreach $searchFor (#searchTerms)
{
#print "searching for $searchFor in: $searchIn\n";
if ($searchIn =~ m/$searchFor/)
{
$temp = "found in $searchIn \n while searching for: $searchFor ";
$results = $results.$temp."\n";
$temp = "";
}
}
}
print $results;
You should always use strict and use warnings at the start of your program, and declare all variables at the point of their first use using my. This applies especially when you are asking for help with your code as this measure can quickly reveal many simple mistakes.
As Raze2dust has said it is important to remember that lines read from a file will have a trailing newline "\n" character. If you were checking for exact matches between a pair of lines then this wouldn't matter, but since it's not working for you I assume the strings in searchTerms.txt can appear anywhere in the lines of fileToSearchIn.xml. That means you need to use chomp the strings from searchTerms.txt; lines from the other file can stay as they are.
Things like this are made a lot easier by using the File::Slurp module. It does all the file handling for you and will chomp any newlines from the input text if you ask.
I have changed your program to use this module so that you can see how it works.
use strict;
use warnings;
use File::Slurp;
my #searchTerms = read_file('searchTerms.txt', chomp => 1);
my #xml = read_file('fileToSearchIn.xml');
my #results;
foreach my $searchIn (#xml) {
foreach my $searchFor (#searchTerms) {
if ($searchIn =~ m/$searchFor/) {
push #results, qq/Found in "$searchIn"\n while searching for "$searchFor"/;
}
}
}
print "$_\n" for #results;
chomp your inputs to remove newline characters:
open (MYFILE, 'searchTerms.txt');
open (MYFILE2, 'fileToSearchIn.xml');
#searchTerms = <MYFILE>;
#xml = <MYFILE2>;
close(MYFILE2);
close(MYFILE);
$results = "";
foreach $searchIn (#xml)
{
chomp($searchIn);
foreach $searchFor (#searchTerms)
{
chomp($searchFor);
#print "searching for $searchFor in: $searchIn\n";
if ($searchIn =~ m/$searchFor/)
{
$temp = "found in $searchIn \n while searching for: $searchFor ";
$results = $results.$temp."\n";
$temp = "";
}
}
}
print $results;
Basically, you are thinking you are searching for 'a', but actually it is searching for 'a\n' because that is how it reads the input unless you use chomp. It matches only if 'a' is the last character because in that case, it will be succeeded by a newline.