Perl - Use of uninitialized value in string - perl

I started teaching myself Perl, and with the help of some Googling, I was able to throw together a script that would print out the file extensions in a given directory. The code works well, however, it will sometimes complain the following:
Use of uninitialized value $exts[xx] in string eq at get_file_exts.plx
I tried to correct this by initializing my array as follows: my #exts = (); but this did not work as expected.
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
#Check for correct number of arguments
if(#ARGV != 1) {
print "ERROR: Incorrect syntax...\n";
print "Usage: perl get_file_exts.plx <Directory>\n";
exit 0;
}
#Search through directory
find({ wanted => \&process_file, no_chdir => 1 }, #ARGV);
my #exts;
sub process_file {
if (-f $_) {
#print "File: $_\n";
#Get extension
my ($ext) = $_ =~ /(\.[^.]+)$/;
#Add first extension
if(scalar #exts == 0) {
push(#exts, $ext);
}
#Loop through array
foreach my $index (0..$#exts) {
#Check for match
if($exts[$index] eq $ext) {
last;
}
if($index == $#exts) {
push(#exts, $ext);
}
}
} else {
#print "Searching $_\n";
}
}
#Sort array
#exts = sort(#exts);
#Print contents
print ("#exts", "\n");

You need to test if you found an extension.
Also, you should not be indexing your array. You also do not need to manage 'push' just do it. It is not the Perl way. Your for loop should start like this:
sub process_file {
if (-f $_) {
#print "File: $_\n";
#Get extension
my ($ext) = $_ =~ /(\.[^.]+)$/;
# If we found an extension, and we have not seen it before, add it to #exts
if ($ext) {
#Loop through array to see if this is a new extension
my $newExt = 1;
for my $seenExt (#exts) {
#Check for match
if ($seenExt eq $ext) {
$newExt = 0
last;
}
}
if ($newExt) {
push #exts,$ext;
}
}
}
}
But what you really want to do is to use a hash table to record if you saw an extension
# Move this before find(...); if you want to initialize it or you will clobber the
# contents
my %sawExt;
sub process_file {
if (-f $_) {
#print "File: $_\n";
# Get extension
my ($ext) = $_ =~ /(\.[^.]+)$/;
# If we have an extension, mark that we've seen it
$sawExt{$ext} = 1
if $ext;
}
}
# Print the extensions we've seen in sorted order
print join(' ',sort keys %sawExt) . "\n";
Or even
sub process_file {
if (-f $_ && $_ =~ /(\.[^.]+)$/) {
$sawExt{$1} = 1;
}
}
Or
sub process_file {
$sawExt{$1} = 1
if -f && /(\.[^.]+)$/;
}
Once you start thinking in Perl this is the natural way to write it

The warning is complaining about a content of $exts[xx], not #exts itself.
Actually $ext can be undef, when the filename doesn't match to your regexp, for instance README.
Try like:
my ($ext) = $_ =~ /(\.[^.]+)$/ or return;

The main problem is that you aren't accounting for file names that don't contain a dot, so
my ($ext) = $_ =~ /(\.[^.]+)$/;
sets $ext to undef.
Despite the warning, processing continues by evaluating undef as the null string, failing to find that in #exts, and so percolating undef to the array as well.
The minimal change to get your code working is to replace
my ($ext) = $_ =~ /(\.[^.]+)$/;
with
return unless /(\.[^.]+)$/;
my $ext = $1;
But there is a couple of Perl lessons to be learned here. It used to be taught that good programs were well-commented programs. That was in the days of having to write efficient but incomprehensible code, but is no longer true. You should write code that is as clear as possible, and add comments only if you absolutely have to write something that isn't self-explanatory.
You should remember and use Perl idioms, and try to forget most C that you knew. For instance, Perl accepts the "here document" syntax, and it is common practice to use or and and as short-circuit operators. Your parameter check becomes
#ARGV or die <<END;
ERROR: Incorrect syntax...
Usage: perl get_file_exts.plx <Directory>
END
Perl allows for clear but concise programming. This is how I would have written your wanted subroutine
sub process_file {
return unless -f and /(\.[^.]+)$/;
my $ext = $1;
foreach my $index (0 .. $#exts) {
return if $exts[$index] eq $ext;
}
push #exts, $ext;
}

Use exists on $exts[xx] before accessing it.
exists is deprecated though as #chrsblck pointed out :
Be aware that calling exists on array values is deprecated and likely
to be removed in a future version of Perl.
But you should be able to check if it exists (and not 0 or "") simply with :
if($exts[index] && $exts[$index] eq $ext){
...
}

Related

Find file that does not contain specific string

I want to find file that does not contain the specific string?
The listed file is like below
../../../experiment/fileA.txt (contain word 'Book')
../../../experiment/fileB.txt (contain word 'Book')
../../../experiment/fileC.txt (do not contain word 'Book')
../../../experiment/fileD.txt (contain word 'Book')
Here is my code
use strict;
use warning;
my $dirname = "../../../experiment/";
my $keyword = "Book";
my #result;
my $find_file = sub {
my $F = $File::Find::name;
if ($F =~ /txt$/) {
open my $in, "<", $F or die $!;
while(<$in>) {
if (/\Q$keyword\E/){
next;
}else{
push #result, $F;
return;
}
}
}
};
find ({ wanted => $find_file, no_chdir=>1}, $dirname );
foreach my $result (#result){
chomp $result;
$result =~ s{.*/}{};
print "$result\n";
}
But it seem does not work. It display all file whether it has the $keyword or not. I only want it to display only if the file does not have the $keyword
There's a simple logic error. The code goes through lines of each file
while (<$in>) {
if (/\Q$keyword\E/){
next;
} else {
push #result, $F;
return;
}
}
and as soon as any one line doesn't have $keyword it adds the file to #result.
You need to check all lines and if $keyword is never found only then add a file. The easiest way to do this is to return from the sub as soon as the thing is found
while (<$in>) {
return if /\Q$keyword/;
}
push #result, $F;
This doesn't address your code, but I'd like to point out that with the grep command on any Linux system you can do exactly what it looks like you're trying to do with this command:
grep -L Book -R ../../../experiment/
Path::Iterator::Rule makes tasks like this really simple. As a side note, I would recommend resolving the directory to an absolute path before iterating.
use strict;
use warnings;
use Cwd 'abs_path';
use File::Basename;
use Path::Iterator::Rule;
my $dirname = abs_path "../../../experiment/";
my $keyword = "Book";
my $rule = Path::Iterator::Rule->new->not_dir->name(qr/txt$/)->not_line_match(qr/\Q$keyword\E/);
my $next = $rule->iter($dirname);
while (defined(my $file = $next->())) {
print basename($file), "\n";
}

perl foreach eat all array content

The follow code will eat content in #populations
#populations=("EUR","AFR","ASN","AMR");
print #populations,"\n"; #will show EURAFRASNAMR
foreach (#populations)
{
$filepath="tmp.txt"; #whatever text file you like
open(FILE,"<$filepath");
while(<FILE>)
{
}
}
print #populations,"\n"; #will print nothing
if change to
foreach $i (#populations)
then the array will not be eaten.
or if mark the while loop, the array will not be eaten.
I am not a perl guru, but have several years experience.
can anyone tell me why? is it a bug of perl?
Not a bug exactly, but it is a trap for the unwary. You are implicitly assigning the elements of #populations to the $_ variable, and then you are reading data from the FILE filehandle into $_, overwriting the contents of #populations.
More explicitly, your code is equivalent to:
#populations=("EUR","AFR","ASN","AMR");
print #populations,"\n"; #will show EURAFRASNAMR
foreach $_ (#populations) # $_ becomes "aliased" to the element in #populations
{
$filepath="tmp.txt"; #whatever text file you like
open(FILE,"<$filepath");
while(defined($_ = <FILE>)) # overwrites $_, and erases element in #populations
{
}
}
print #populations,"\n"; #will print nothing
You found a good workaround, which is to avoid using $_ implicitly in a for loop.
while (<$fh>) { ... }
gets replaced with
while (defined($_ = <$fh>)) { ... }
which is why the value read is available in $_. The catch is that $_ is currently aliased to an element of #populations.
As you said, you can avoid that problem by using
for my $population (#populations) {
...
while (<FILE>) { ... $_ ... }
...
}
But that can still clobber $_ from the calling sub. Here are two more robust fixes:
for (#populations) { # Or: for my $population (#populations) {
...
while (my $line = <FILE>) { ... $line ... }
...
}
or
for (#populations) { # Or: for my $population (#populations) {
...
while (local $_ = <FILE>) { ... $_ ... }
...
}
For exactly the same reason, you shouldn't use global variables for file handles.
$filepath="tmp.txt"; #whatever text file you like
open(FILE,"<$filepath");
<FILE>
should be
my $filepath="tmp.txt";
open(my $FILE, "<", $filepath) or die $!;
<$fh>
By the way, always use use strict; use warnings;.

perl array for non-zero values

Im trying to code a "service" script based on "ps".
my code:
#!/usr/bin/perl
use strict;
use warnings;
die "usage: $0 <service name>\n" unless $ARGV[0];
my $service = $ARGV[0];
open(my $ps, "ps -aux |") || die "Uknown command\n";
my #A = <$ps>;
close $ps;
foreach my $i(grep /$service/, #A){
chomp $i;
if($i=~ /root/){
next
}
print "$i\n";
}
My problem: When running the script against undef arg like:
$0 blablabla
I want to return an output if there is no such service appears/when returns 0
Thanks
I assume what you are asking is: How to give a proper message when no matching lines are found?
Well, just store the result in an array instead:
my #lines = grep { !/root/ && /$service/ } #A;
if (#lines) { # if any lines are found
for my $line (#lines) {
...
}
} else {
print "No match for '$service'!\n";
}
Or you can print the number of matches regardless of their number:
my $found = #lines;
print "Matched found: $found\n";
Note also that you can add the check for root in your grep.
As a side note, this part:
die "usage: $0 <service name>\n" unless $ARGV[0];
my $service = $ARGV[0];
Is perhaps better written
my $service = shift;
die "usage ...." unless defined $service;
Which specifically checks if the argument is defined or not, as opposed to true or not.
If I understand you correctly, you want to inform the user if no such service was found? If so, you can modify the script as follows:
my $printed; # Will be used as a flag.
foreach my $i(grep /$service/, #A){
chomp $i;
if($i=~ /root/){
next
}
$printed = print "$i\n"; # Set the flag if the service was found.
}
warn "No service found\n" unless $printed;
You can try something like this:
my #processes = grep /$service/, #A;
if ( scalar #processes ) {
foreach my $i( #processes ){
chomp $i;
if($i=~ /root/){
next;
}
print "$i\n";
}
}
else {
print 'your message';
}
You could check the result of the grep command before traversing it in the for loop, like:
...
my #services = grep { m/$service/ } #A;
# Filter the perl process running this script and...
if ( ! #services ) {
print "No service found\n";
exit 0;
}
foreach my $i( #services ){
...
}
Take into account that the grep command will never give a false return because it is including the perl process, so you will have to filter it, but I hope you get the idea.

Display And Pass Command Line Arguments in Perl

I have the following program "Extract.pl", which opens a file, finds the lines containing "warning....", "info...", "disabling..." then counts and prints the value and number of them. It is working ok.
What I want to do is to create command line arguments for each of the 3 matches - warning, disabling and infos and then run either of them from the command prompt.
Here is the code:
#!/usr/bin/perl
use strict;
use warnings;
my %warnings = ();
my %infos = ();
my %disablings = ();
open (my $file, '<', 'Warnings.txt') or die $!;
while (my $line = <$file>) {
if($line =~ /^warning ([a-zA-Z0-9]*):/i) {
++$warnings{$1};
}
if($line =~ /^disabling ([a-zA-Z0-9]*):/i) {
++$disablings{$1};
}
if($line =~ /^info ([a-zA-Z0-9]*):/i) {
++$infos{$1};
}
}
close $file;
foreach my $w (sort {$warnings{$a} <=> $warnings{$b}} keys %warnings) {
print $w . ": " . $warnings{$w} . "\n";
}
foreach my $d (sort {$disablings{$a} <=> $disablings{$b}} keys %disablings) {
print $d . ": " . $disablings{$d} . "\n";
}
foreach my $i (sort {$infos{$a} <=> $infos{$b}} keys %infos) {
print $i . ": " . $infos{$i} . "\n";
}
The builtin special array #ARGV holds all command line arguments to the script, excluding the script file itself (and the interpreter, if called as perl script.pl). In the case of a call like perl script.pl foo bar warnings, #ARGV would contain the values 'foo', 'bar', and 'warnings'. It's a normal array, so you could write something like (assuming the first argument is one of your options):
my ($warning, $info, $disabling);
if ($ARGV[0] =~ /warning/i) { $warning = 1 }
elsif ($ARGV[0] =~ /info/i) { $info = 1 }
elsif ($ARGV[0] =~ /disabling/i) { $disabling = 1 }
# [...] (opening the file, starting the main loop etc...)
if ( $warning and $line =~ /^warning ([a-zA-Z0-9]*)/i ) {
++$warnings{$1};
}
elsif ( $info and $line =~ /^info ([a-zA-Z0-9]*)/i ) {
++$infos{$1};
}
elsif ( $disabling and $line =~ /^disabling ([a-zA-Z0-9]*)/i ) {
++$disablings{$1};
}
I created flag variables for the three conditions before the main loop that goes through the file to avoid a regex compilation on every line of the file.
You could also use the Getopt::Long or Getopt::Std modules. These provide easy and flexible handling of the command line arguments.

How to create the next file or folder in a series of progressively numbered files?

Sorry for the bad title but this is the best I could do! :D
I have a script which creates a new project every time the specified function is called.
Each project must be stored in its own folder, with the name of the project. But, if you don't specify a name, the script will just name it "new projectX", where X is a progressive number.
With time the user could rename the folders or delete some, so every time the script runs, it checks for the smallest number available (not used by another folder) and creates the relevant folder.
Now I managed to make a program which I think works as wanted, but I would like to hear from you if it's OK or there's something wrong which I'm unable to spot, given my inexperience with the language.
while ( defined( $file = readdir $projects_dir ) )
{
# check for files whose name start with "new project"
if ( $file =~ m/^new project/i )
{
push( #files, $file );
}
}
# remove letters from filenames, only the number is left
foreach $file ( #files )
{
$file =~ s/[a-z]//ig;
}
#files = sort { $a <=> $b } #files;
# find the smallest number available
my $smallest_number = 0;
foreach $file ( #files )
{
if ( $smallest_number != $file )
{
last;
}
$smallest_number += 1;
}
print "Smallest number is $smallest_number";
Here's a basic approach for this sort of problem:
sub next_available_dir {
my $n = 1;
my $d;
$n ++ while -e ($d = "new project$n");
return $d;
}
my $project_dir = next_available_dir();
mkdir $project_dir;
If you're willing to use a naming pattern that plays nicely with Perl's string auto-increment feature, you can simplify the code further, eliminating the need for $n. For example, newproject000.
I think I would use something like:
use strict;
use warnings;
sub new_project_dir
{
my($base) = #_;
opendir(my $dh, $base) || die "Failed to open directory $base for reading";
my $file;
my #numbers;
while ($file = readdir $dh)
{
$numbers[$1] = 1 if ($file =~ m/^new project(\d+)$/)
}
closedir($dh) || die "Failed to close directory $base";
my $i;
my $max = $#numbers;
for ($i = 0; $i < $max; $i++)
{
next if (defined $numbers[$i]);
# Directory did not exist when we scanned the directory
# But maybe it was created since then!
my $dir = "new project$i";
next unless mkdir "$base/$dir";
return $dir;
}
# All numbers from 0..$max were in use...so try adding new numbers...
while ($i < $max + 100)
{
my $dir = "new project$i";
$i++;
next unless mkdir "$base/$dir";
return $dir;
}
# Still failed - give in...
die "Something is amiss - all directories 0..$i in use?";
}
Test code:
my $basedir = "base";
mkdir $basedir unless -d $basedir;
for (my $j = 0; $j < 10; $j++)
{
my $dir = new_project_dir($basedir);
print "Create: $dir\n";
if ($j % 3 == 2)
{
my $k = int($j / 2);
my $o = "new project$k";
rmdir "$basedir/$o";
print "Remove: $o\n";
}
}
Try this:
#!/usr/bin/env perl
use strict;
use warnings;
# get the current list of files
# see `perldoc -f glob` for details.
my #files = glob( 'some/dir/new\\ project*' );
# set to first name, in case there are none others
my $next_file = 'new project1';
# check for others
if( #files ){
# a Schwartian transform
#files = map { $_->[0] } # get original
sort { $a->[1] <=> $b->[1] } # sort by second field which are numbers
map { [ $_, do{ ( my $n = $_ ) =~ s/\D//g; $n } ] } # create an anonymous array with original value and the second field nothing but digits
#files;
# last file name is the biggest
$next_file = $files[-1];
# add one to it
$next_file =~ s/(.*)(\d+)$/$1.($2+1)/e;
}
print "next file: $next_file\n";
Nothing wrong per se, but that's an awful lot of code to achieve a single objective (get the minimum index of directories.
A core module, couple of subs and few Schwartzian transforms will make the code more flexible:
use strict;
use warnings;
use List::Util 'min';
sub num { $_[0] =~ s|\D+||g } # 'new project4' -> '4', 'new1_project4' -> '14' (!)
sub min_index {
my ( $dir, $filter ) = #_;
$filter = qr/./ unless defined $filter; # match all if no filter specified
opendir my $dirHandle, $dir or die $!;
my $lowest_index = min # get the smallest ...
map { num($_) } # ... numerical value ...
grep { -d } # ... from all directories ...
grep { /$filter/ } # ... that match the filter ...
readdir $dirHandle; # ... from the directory contents
$lowest_index++ while grep { $lowest_index == num( $_ ) } readdir $dirhandle;
return $lowest_index;
}
# Ready to use!
my $index = min_index ( 'some/dir' , qr/^new project/ );
my $new_project_name = "new project $index";