Using perl script find file count with multiple pattern - perl

I want to search files count inside a directory.
If number of files in folder less than or equal to 5 list the file names with count,
else files not found in directory like "zero files found".
Ex: Files inside directory like
testfile1d_01012022.txt
testfile2d_01012022.txt
testfile4d_01012022.txt
testfile9d_01012022.txt
testfile7d_01012022.txt
based on above file names, I given search pattern
$Filename = "testfile*d*".
But final result getting like below
testfiled_01012022.txt
testfiled_01012022.txt
testfiled_01012022.txt
testfiled_01012022.txt
testfiled_01012022.txt -its removed number after google d
below variables used in script.
my #files = grep!/^\.\.?$/, readdir DIR
$Filename =~ s/\*//g;
Let me know how to show available files with count.

You say that you are using this "search pattern".
$Filename = "testfile*d*"
What that looks like is a shell filename pattern, like if you were to say ls testfile*d* in the shell. That's not how regular expressions work.
If you want a pattern that matches "testfile" and then some arbitrary string and then "d" and then some other arbitrary string, you would write that as
testfile.*d.*
Second, you're not using that pattern correctly. Your code shows:
my #files = grep!/^\.\.?$/, readdir DIR
$Filename =~ s/\*//g;
The first line does read a list of files, but the second line doesn't do anything useful. What you want your second line to be is:
my #matched_files = grep /testfile.*d.*/, #files;
When you do that, #matched_files will have all the files that match that regular expression.
HOWEVER, you don't have to use regular expressions at all to get a list of files. You can use the built-in glob function that DOES handle shell globbing like you want, and you don't need that readdir stuff.
my #files = glob( 'testfile*d*' ); # Note it uses shell glob syntax, not regex

Related

Using chop in grep expression

My Perl script searches a directory of file names, using grep to output only file names without the numbers 2-9 in their names. That means, as intended, that file names ending with the number "1" will also be returned. However, I want to use the chop function to output these file names without the "1", but can't figure out how. Perhaps the grep and chop functions can be combined in one line of code to achieve this? Please advise. Thanks.
Here's my Perl script:
#!/usr/bin/perl
use strict;
use warnings;
my $dir = '/Users/jdm/Desktop/xampp/htdocs/cnc/images/plants';
opendir(DIR, $dir);
#files = grep (/^[^2-9]*\.png\z/,readdir(DIR));
foreach $file (#files) {
print "$file\n";
}
Here's the output:
Ilex_verticillata.png
Asarum_canadense1.png
Ageratina_altissima.png
Lonicera_maackii.png
Chelone_obliqua1.png
Here's my desired output with the number "1" removed from the end of file names:
Ilex_verticillata.png
Asarum_canadense.png
Ageratina_altissima.png
Lonicera_maackii.png
Chelone_obliqua.png
The number 1 to remove is at the end of the name before the extension; this is different from filtering on numbers (2-9) altogether and I wouldn't try to fit it into one operation.
Instead, once you have your filtered list (no 2-9 in names), then clip off that 1. Seeing that all names of interest are .png can simply use a regex
$filename =~ s/1\.png\z/.png/;
and if there is no 1 right before .png the string is unchanged. If it were possible to have other extensions involved then you should use a module to break up the filename.
To incorporate this, you can pass grep's output through a map
opendir my $dfh, $dir or die "Can't open $dir: $!";
my #files =
map { s/1\.png\z/.png/r }
grep { /^[^2-9]*\.png\z/ }
readdir $dfh;
where I've also introduced a lexical directory filehandle instead of a glob, and added a check on whether opendir worked. The /r modifier on the substitution in map is needed so that the string is returned (changed or unchanged if regex didn't match), and not changed in place, as needed here.
This passes over the list of filenames twice, though, while one can use a straight loop. In principle that may impact performance; however, here all operations are done on each element of a list so a difference in performance is minimal.
You could use use the following:
s/1//g for #files;
It's also possible to integrate a solution into your chain using map.
my #files =
map s/1//rg,
grep /^[^2-9]*\.png\z/,
readdir(DIR);

Get the path for a similarly named file in perl, where only the extension differs?

I'm trying to write an Automator service, so I can chuck this into a right-click menu in the gui.
I have a filepath to a txt file, and there is a similarly named file that varies only in the file extension. This can be a pdf or a jpg, or potentially any other extension, no way to know beforehand. How can I get the filepath to this other file (there will only be one such)?
$other_name =~ s/txt$/!(txt)/;
$other_name =~ s/ /?/g;
my #test = glob "$other_name";
In Bash, I'd just turn on the extglob option, and change the "txt" at the end to "!(txt)" and the do glob expansion. But I'm not even sure if that's available in perl. And since the filepaths always have spaces (it's in one of the near-root directory names), that further complicates things. I've read through the glob() documentation at http://perldoc.perl.org/functions/glob.html and tried every variation of quoting (the above example code shows my attempt after having given up, where I just remove all the spaces entirely).
It seems like I'm able to put modules inside the script, so this doesn't have to be bare perl (just ran a test).
Is there an elegant or at least simple way to accomplish this?
You can extract everything in the filename up to extension, then run a glob with that and filter out the unneeded .txt. This is one of those cases where you need to protect the pattern in the glob with a double set of quotes, for spaces.
use warnings;
use strict;
use feature qw(say);
my $file = "dir with space/file with spaces.txt";
# Pull the full name without extension
my ($basefname) = $file =~ m/(.*)\.txt$/;
# Get all files with that name and filter out unneeded (txt)
my #other_exts = grep { not /\.txt$/ } glob(qq{"$basefname.*"});
say for #other_exts;
With a toy structure like this
dir space/
file with spaces.pdf
file with spaces.txt
The output is
dir space/file with spaces.pdf
This recent post has more on related globs.
Perl doesn't allow the not substring construct in glob. You have to find all files with the same name and any extension, and remove the one ending with .txt
This program shows the idea. It splits the original file name into a stem part and a suffix part, and uses the stem to form a glob pattern. The grep removes any result that ends with the original suffix
It picks only the first matching file name if there is more than one candidate. $other_name will be set to undef if no matching file was found
The original file name is expected as a parameter on the command line
The result is printed to STDOUT; I don't know what you need for your right-click menu
The line use File::Glob ':bsd_glob' is necessary if you are working with file paths that contain spaces, as it seems you are
use strict;
use warnings 'all';
use File::Glob ':bsd_glob';
my ($stem, $suffix) = shift =~ /(.*)(\..*)/;
my ($other_name) = grep ! /$suffix$/i, glob "$stem.*";
$other_name =~ tr/ /?/;
print $other_name, "\n";
This is an example, based on File::Basename core module
use File::Basename;
my $fullname = "/path/to/my/filename.txt";
my ($name, $path, $suffix) = fileparse($fullname, qw/.txt/);
my $new_filename = $path . $name . ".pdf";
# $name --> filename
# $path --> /path/to/my/
# $suffix --> .txt
# $new_filename --> /path/to/my/filename.pdf

pattern search in all the files in a directory

I have the pattern something like "keyword : Multinode". Now, I need to search this pattern in all the files in a directory. If we found the pattern in any of the file, a non empty-string should be returned. It may contain file-name or directory name
In shell scripting the following will do the same
KeyMnode=grep -w "keyword : Multinode" ${dirname}/*
I thought of using find(subroutine,directory_path) and inside the sub-routine I want to traverse through the entire directory for all its entries. For every entry I want to put a check whether it is a readable file or not. If the file is readable, I want to search for the required pattern "keyword : Multinode" in the file found. If we hit with a success, the entire find command should result in a non-empty string(preferably only the existing directory Name) otherwise with an empty string. Please let me know if you need any further information.
I want this to be done using perl. Please help me with the solution.
Here are some Perl tools that will be useful in doing what you described:
File::Find will do a recursive search for files in a directory and its children, running code (the \&wanted callback in the docs) against each one to determine whether it meets your criteria or not
The -r operator will tell you whether a file is readable (if (-r $file_name)...)
open will get you access to the file and <$fh> will read its contents so that you can check with a regular expression whether they match your target pattern
Adding \b to the beginning and end of the pattern will cause it to match only at word boundaries, similar to grep's -w switch
If you have more specific issues, please post additional questions with code that demonstrates them, including statements both of what you expected to happen and of how the actual results differed from your expectation and we'll be happy to help resolve those issues.
Edit: Cleaned up and runnable version of code from comment:
#!/usr/bin/env perl
use strict;
use warnings;
use 5.010;
use File::Find;
# Get $dirname from first command-line argument
my $dirname = shift #ARGV;
find(\&do_process, $dirname); # quotes around $dirname weren't needed
my ($KeyMnode, $KeyThreads);
sub do_process {
# chomp($_); - not needed; $_ isn't read from a file, so no newline on it
if (-r $_) { # quotes around $_ weren't needed
# $_ is just the final part of the file name; it may be better for
# reporting the location of matches to set $file_name to
# $File::Find::name instead
my $file_name = $_;
open(my $fh, '<', $file_name); # Use three-arg open!
while (<$fh>) {
chomp();
# Note that, if you store all matches into the same scalar values,
# you'll end up with only the last value found for each pattern; you
# may want to push the matches onto arrays instead.
if (/\bkeyword : Multinode\b/i) { $KeyMnode = "$file_name:$_"; }
if (/\bkeyword : Threads\b/i) { $KeyThreads = "$file_name:$_"; }
}
}
}

How do you list all files without a suffix using Perl?

I want to add a suffix to all the files without a suffix in a directory with mixed content.
I only want to fetch files without a suffix
Then I want to add a suffix to them (like, say, .txt or .html)
It's part one I'm having trouble with.
I'm using glob to fetch all the files. Here's the code excerpt:
my #files = grep ( -f, <*> );
-f makes sure only files are added, and the * wildcard allows all names.
But how do I rewrite that to only fetch files that have no suffix? Or in the least, how do I wash the array of suffixed files?
A file without a suffix is one without a dot in its name.
my #files = grep { -f and not /\./ } <*>;
You can just tack on another grep statement:
my #files = grep !/\.\w{1,4}$/, grep -f, <*>;
# -------------------
Or you can, as Borodin points out, do it in one:
my #files = grep !/\.\w{1,4}$/ && -f, <*>;
You can change the regex to fit better depending on what type of suffixes you have. The regex looks for files which do not match a period, followed by 1 to 4 alphanumeric characters, at the end of the string. I opted for a rather loose regex to match a multitude of possible suffixes.
Using grep, all you need is to add a regular expression:
my #files = grep { !/\.\w+\z/ && -f } ( <*> );

Questions About Perl Filename Wildcard

I am using perl to address some text files. I want to use perl filename wild card to find all the useful files in a folder and address them one by one, but my there are spaces in the filename. Then I find the filename wildcard cannot address those filenames properly. Here is my code:
my $term = "Epley maneuver";
my #files = <rawdata/*$term*.csv>;
my $infiles;
foreach $infilename (#files) {
if($infilename =~ m/(\d+)_.*\.csv/)
{
$infiles{$infilename} = $1;
print $1."\n";
}
}
The filename are like:
34_Epley maneuver_2012_4_6.csv
33_Epley maneuver_2012_1_3.csv
32_Epley maneuver_2011_10_12.csv
...
They are in a folder named "rawdata".
When I used this for terms that don't contain spaces, like "dizzy", it works well. But when the term contains space, it just stop working. I searched this on Google, but find little useful information.
What happens and how can I do this correctly?
Any help will be good. Thanks a lot.
The glob operator works like the command-line processor. If you write <rawdata/*Epley maneuver*.csv> it will look for files that match rawdata/*Epley or maneuver*.csv
You must put your glob expression in double-quotes:
my #files = <"rawdata/*$term*.csv">