Using chop in grep expression - perl

My Perl script searches a directory of file names, using grep to output only file names without the numbers 2-9 in their names. That means, as intended, that file names ending with the number "1" will also be returned. However, I want to use the chop function to output these file names without the "1", but can't figure out how. Perhaps the grep and chop functions can be combined in one line of code to achieve this? Please advise. Thanks.
Here's my Perl script:
#!/usr/bin/perl
use strict;
use warnings;
my $dir = '/Users/jdm/Desktop/xampp/htdocs/cnc/images/plants';
opendir(DIR, $dir);
#files = grep (/^[^2-9]*\.png\z/,readdir(DIR));
foreach $file (#files) {
print "$file\n";
}
Here's the output:
Ilex_verticillata.png
Asarum_canadense1.png
Ageratina_altissima.png
Lonicera_maackii.png
Chelone_obliqua1.png
Here's my desired output with the number "1" removed from the end of file names:
Ilex_verticillata.png
Asarum_canadense.png
Ageratina_altissima.png
Lonicera_maackii.png
Chelone_obliqua.png

The number 1 to remove is at the end of the name before the extension; this is different from filtering on numbers (2-9) altogether and I wouldn't try to fit it into one operation.
Instead, once you have your filtered list (no 2-9 in names), then clip off that 1. Seeing that all names of interest are .png can simply use a regex
$filename =~ s/1\.png\z/.png/;
and if there is no 1 right before .png the string is unchanged. If it were possible to have other extensions involved then you should use a module to break up the filename.
To incorporate this, you can pass grep's output through a map
opendir my $dfh, $dir or die "Can't open $dir: $!";
my #files =
map { s/1\.png\z/.png/r }
grep { /^[^2-9]*\.png\z/ }
readdir $dfh;
where I've also introduced a lexical directory filehandle instead of a glob, and added a check on whether opendir worked. The /r modifier on the substitution in map is needed so that the string is returned (changed or unchanged if regex didn't match), and not changed in place, as needed here.
This passes over the list of filenames twice, though, while one can use a straight loop. In principle that may impact performance; however, here all operations are done on each element of a list so a difference in performance is minimal.

You could use use the following:
s/1//g for #files;
It's also possible to integrate a solution into your chain using map.
my #files =
map s/1//rg,
grep /^[^2-9]*\.png\z/,
readdir(DIR);

Related

Check whether a field from a line of text line matches a value

I have been using the following Perl code to extract text from multiple text files. It works fine.
Example of a couple of lines in one of the input files:
Fa0/19 CUTExyz notconnect 129 half 100 10/100BaseTX
Fa0/22 xyz MLS notconnect 1293 half 10 10/100BaseTX
What I need is to match the numbers in each line exactly (i.e. 129 is not matched by 1293) and print the corresponding lines.
It would also be nice to match a range of numbers leaving specific numbers out i.e. match 2 through 10 but not 11 the 12 through 20
#!/perl/bin/perl
use warnings;
my #files = <c:/perl64/files/*>;
foreach $file ( #files ) {
open( FILE, "$file" );
while ( $line = <FILE> ) {
print "$file $line" if $line =~ /123/n;
}
close FILE;
}
Thank you for the suggestions, but can it can be done using the code structure above?
I suggest that you take a look at perldoc perlre.
You need to anchor your regex pattern. The easiest way is probably using \b which is a zero-width boundary between alphanumerics and non-alphanumerics.
#!/perl/bin/perl
use warnings;
use strict;
foreach my $file ( glob "c:/perl64/files/*" ) {
open( my $input, '<', $file ) or die $!;
while (<$input>) {
print "$file $_" if m/\b123\b/;
}
close $input;
}
Note - you should use three-argument open with lexical file handles as above, because it is better practice.
I've also removed the n pattern modifier, as it appears redundant.
Following your edit though, to give us some source data. I'd suggest the solution is not to use a regex - your source data looks space delimited. (Maybe those are tabs?).
So I'd suggest you're better off using split and selecting the field you want, and testing it numerically, because you mention matching ranges. This is not a good fit for regexes because they don't understand the numeric content.
Instead:
while ( <$input> ) {
print if (split)[-4] == 129;
}
Note - I use -4 in the split, which indexes from the end of the list.
This is because column 3 contains spaces, so splitting on whitespace is going to produce the wrong result unless we count down from the end of the array. Using a negative index we get the right field each time.
If your data is tab separated then you could use chomp and split /\t/. Or potentially split on /\s{2,}/ to split on 2-or-more spaces
But by selecting the field, you can do numeric tests on it, like
if $fields[-4] > 100 and $fields[-4] < 200
etc.
I hope you don't get the answers you're asking for, which discard best practice because of your unfamiliarity with Perl. It is inappropriate to ask how to write an ugly solution because proper Perl is beyond your reach
As has been said repeatedly on this site, if you don't know how to do a job then you should hire someone who does know and pay them for their work. No other profession that I know has the expectation of getting quality work done for free
Here's a few notes on your code. Wherever you have learned your techniques, you have been looking at a very outdated resource
Do you really have a root directory perl, so that your compiler is /perl/bin/perl? That's very unusual, and there is no need to use a shebang line in Windows
You must always add use strict and use warnings 'all' at the top of every Perl program you write, and declare all of your variables using my as close as possible to their first point of use. For some reason you do this with #files but not with $file
It is better to replace <c:/perl64/files/*> with glob 'C:/perl64/files/*'. Otherwise the code is less clear because Perl overloads the <> operator
Don't put variable names inside double quotes. It is unnecessary at best, and may cause bugs. So "$file" should be $file
Always use the three-parameter version of open, so that the second parameter is the open mode
Don't use global file handles. And always test whether the file has been opened correctly, dying with a message including $!—the reason for the failure—if the open fails
open( FILE, "$file" )
should be something like
open my $fh, '<', $file or die qq{Unable to open "$file" for input: $!}
Don't rely on regex patterns for everything. In this case it looks like split would be a better option, or perhaps unpack if your records have fixed-width fields. In my solution below I have used split on "more than one space", but if your real data is different from what you have shown (tab-delimited?) then this is not going to work
Note that Fa0/129 will also be matched by your current approach
This Perl program filters your data, printing lines where the fourth field $lines[3] (delineated by more than one whitespace character) is numerically equal to 129
The output shown is produced when the input is the single file splitn.txt, containing the data shown in your question
use strict;
use warnings 'all';
for my $file ( glob 'C:/perl64/files/*' ) {
open my $fh, '<', $file or die qq{Unable to open "$file" for input: $!};
while ( my $line = <$fh> ) {
chomp;
my #fields = split /\s\s+/, $line;
print "$file $line" if $fields[3] == 129;
}
}
output
splitn.txt Fa0/19 CUTExyz notconnect 129 half 100 10/100BaseTX
Your question is unclear. When you say:
What I need is to match numbers in the on each line exactly
That could mean a couple of things. It could mean that each line contains nothing but a single number which you want to match. In that case, using == is probably better than using a regular expression. Or it could mean that you have lots of text on a line and you only want to match complete numbers. In that case you should use \b (the "word boundary" anchor) - /\b123\b/.
If you're clearer in your questions (perhaps by giving us sample input) then people won't have to guess at your meaning.
A few more points on your code:
Always include both use strict and use warnings.
Always check the return value from open() and take appropriate action on failure.
Use lexical filehandles and 3-arg version of open().
No need to quote $file in your open() call.
Using $_ can simplify your code.
/n on the match operator has no effect unless your regex contains parentheses.
Putting that all together (and assuming my second interpretation of your question is correct), your code could look like this:
#!/perl/bin/perl
use strict;
use warnings;
my #files = <c:/perl64/files/*>;
foreach my $file (#files) {
open my $file_h, '<', $file
or die "Can't open $file: $!";
while (<$file_h>) {
print "$file $_\n" if /\b123\b/;
}
# No need to close $file_h as it is closed
# automatically when the variable goes out
# of scope.
}

create new file listing all text files in a directory with perl

I am trying to list out all text files in a directory using perl. The below does run but the resulting file is empty. This seems close but maybe it is not what I need. Thank you :).
get_list.pl
#!/bin/perl
# create a list of all *.txt files in the current directory
opendir(DIR, ".");
#files = grep(/\..txt$/,readdir(DIR));
closedir(DIR);
# print all the filenames in our array
foreach $file (#files) {
print "$file\n";
}
As written, your grep is wrong:
#files = grep(/\..txt$/,readdir(DIR));
In regular expressions - . means any character. So you will find a file called
fish.mtxt
But not a file called
fish.txt
Because of that dot.
You probably want to grep /\.txt/, readdir(DIR)
But personally, I wouldn't bother, and just use glob instead.
foreach my $file (glob "*.txt") {
print $file,"\n";
}
Also - turn on use strict; use warnings;. Consider them mandatory until you know why you want to turn them off. (There are occasions, but you'll know what they are if you ever REALLY NEED to).
You have one excess dot:
#files = grep(/\..txt$/,readdir(DIR));
should be:
#files = grep(/\.txt$/,readdir(DIR));

pattern search in all the files in a directory

I have the pattern something like "keyword : Multinode". Now, I need to search this pattern in all the files in a directory. If we found the pattern in any of the file, a non empty-string should be returned. It may contain file-name or directory name
In shell scripting the following will do the same
KeyMnode=grep -w "keyword : Multinode" ${dirname}/*
I thought of using find(subroutine,directory_path) and inside the sub-routine I want to traverse through the entire directory for all its entries. For every entry I want to put a check whether it is a readable file or not. If the file is readable, I want to search for the required pattern "keyword : Multinode" in the file found. If we hit with a success, the entire find command should result in a non-empty string(preferably only the existing directory Name) otherwise with an empty string. Please let me know if you need any further information.
I want this to be done using perl. Please help me with the solution.
Here are some Perl tools that will be useful in doing what you described:
File::Find will do a recursive search for files in a directory and its children, running code (the \&wanted callback in the docs) against each one to determine whether it meets your criteria or not
The -r operator will tell you whether a file is readable (if (-r $file_name)...)
open will get you access to the file and <$fh> will read its contents so that you can check with a regular expression whether they match your target pattern
Adding \b to the beginning and end of the pattern will cause it to match only at word boundaries, similar to grep's -w switch
If you have more specific issues, please post additional questions with code that demonstrates them, including statements both of what you expected to happen and of how the actual results differed from your expectation and we'll be happy to help resolve those issues.
Edit: Cleaned up and runnable version of code from comment:
#!/usr/bin/env perl
use strict;
use warnings;
use 5.010;
use File::Find;
# Get $dirname from first command-line argument
my $dirname = shift #ARGV;
find(\&do_process, $dirname); # quotes around $dirname weren't needed
my ($KeyMnode, $KeyThreads);
sub do_process {
# chomp($_); - not needed; $_ isn't read from a file, so no newline on it
if (-r $_) { # quotes around $_ weren't needed
# $_ is just the final part of the file name; it may be better for
# reporting the location of matches to set $file_name to
# $File::Find::name instead
my $file_name = $_;
open(my $fh, '<', $file_name); # Use three-arg open!
while (<$fh>) {
chomp();
# Note that, if you store all matches into the same scalar values,
# you'll end up with only the last value found for each pattern; you
# may want to push the matches onto arrays instead.
if (/\bkeyword : Multinode\b/i) { $KeyMnode = "$file_name:$_"; }
if (/\bkeyword : Threads\b/i) { $KeyThreads = "$file_name:$_"; }
}
}
}

Perl File Name Change

I am studying and extending a Perl script written by others. It has a line:
#pub=`ls $sourceDir | grep '\.htm' | grep -v Default | head -550`;
foreach (#pub) {
my $docName = $_;
chomp($docName);
$docName =~ s/\.htm$//g;
............}
I know that it uses a UNIX command firstly to take out all the htm files, then get rid of file extension.
Now I need to do one thing, which is also very important. That is, I need to change the file name of the actual files stored, by replacing the white space with underscore. I am stuck here because I am not sure whether I should follow his code style, achieving this by using UNIX, or I should do this in Perl? The point is that I need to modify the real file on the disk, not the string which used to hold the file name.
Thanks.
Something like this should help (not tested)
use File::Basename;
use File::Spec;
use File::Copy;
use strict;
my #files = grep { ! /Default/ } glob("$sourceDir/*.htm");
# I didn't implement the "head -550" part as I don't understand the point.
# But you can easily do it using `splice()` function.
foreach my $file (#files) {
next unless (-f $file); # Don't rename directories!
my $dirname = dirname($file); # file's directory, so we rename only the file itself.
my $file_name = basename($file); # File name fore renaming.
my $new_file_name = $file_name;
$new_file_name =~ s/ /_/g; # replace all spaces with underscores
rename($file, File::Spec->catfile($dirname, $new_file_name))
or die $!; # Error handling - what if we couldn't rename?
}
It will be faster to use File::Copy to move the file to its new name rather than using this method which forks off a new process, spawns a new shell, etc. it takes more memory and is slower than doing it within perl itself.
edit.. you can get rid of all that backtick b.s., too, like this
my #files = grep {!/Default/} glob "$sourcedir/*.html";

Perl - How to open directory - Return name of lowest numerically numbered filename using posix and or abs?

Well I am back again, stuck on another seemingly simple routine.
I need to figure out how to do this with Perl.
1- I open a directory full of files named 1.txt, 2.txt ~ 100.txt.
(But sometimes the lowest numbered filename could in fact be any number (27.txt) due to 0-26.txt already removed from directory.)
(I found out how to implement ABS sort so; 1,2,3 not 1,10,11 ~ 2,20 was the order returned.)
use POSIX;
my #files = </home/****/users/*.txt>;
foreach $file (#files) {
##$file ABS($file)
##and so on..
##EXAMPLE NOT TRIED
}
2- I just want to return the lowest numbered file name in the directory into a $var.
Do I have to read the whole directory into an array, do an abs sort, then grab the first one in the array off?
Is there a more efficient way to grab the lowest numbered file?
More info:
The files were created by/with a loop so, I also contemplated grabbing the oldest file first if the creation time is actually that sensitive. But, I am a beginner and don't know if creation time is accurate enough, and how to use it or if in fact that is a viable solution.
Thanks for the help, I always find the best people here.
use strict;
use warnings;
use File::Slurp qw(read_dir);
use File::Spec::Functions qw(catfile);
my $directory = 'some/directory';
my #files = read_dir($directory);
my #ordered;
{
no warnings 'numeric';
#ordered = sort { $a <=> $b } #files;
}
my $lowest_file = catfile $directory, $ordered[0];