Perl regex search of file name and extensions against a predefined array - perl

I want to filter out some files from a directory. I am able to grab the files and their extensions recursively, but now what I want to do is to match the file extension and file name with a predefined array of extensions and file names using wildcard search as we use to do in sql.
my #ignore_exts = qw( .vmdk .iso .7z .bundle .wim .hd .vhd .evtx .manifest .lib .mst );
I want to filter out the files which will have extensions like the above one.
e.g. File name is abc.1.149_1041.mst and since the extension .mst is present in #ignore_ext, so I want this to filter out. The extension I am getting is '.1.149_1041.mst'. As in sql I'll do something like select * from <some-table> where extension like '%.mst'. Same thing I want to do in perl.
This is what I am using for grabbing the extension.
my $ext = (fileparse($filepath, '\..*?')) [2];

In order to pull a file extension off a filename this should work:
/^(.*)\.([^.]+)$/
$fileName = $1;
$extension = $2;
This might do the trick for you.
Input: a.b.c.text
$1 will be a.b.c.d
$2 will be text
Basically this will take everything from the start of the line until the last period and group that in the 1st group, and then everything from the last period to the end of the line as group 2
You can see a sample here: http://regex101.com/r/vX3dK1
As for checking whether the extension exists in the array read here: (How can I check if a Perl array contains a particular value?)
if (grep (/^$extension/, #array)) {
print "Extension Found\n";
}

Just turn your list of extensions into a regular expression, and then test against the $filepath.
my #ignore_exts = qw( .vmdk .iso .7z .bundle .wim .hd .vhd .evtx .manifest .lib .mst );
my $ignore_exts_re = '(' . join('|', map quotemeta, #ignore_exts) . ')$';
And then later to compare
if ($filepath =~ $ignore_exts_re) {
print "Ignore $filepath because it ends in $1\n";
next;

Related

Using perl script find file count with multiple pattern

I want to search files count inside a directory.
If number of files in folder less than or equal to 5 list the file names with count,
else files not found in directory like "zero files found".
Ex: Files inside directory like
testfile1d_01012022.txt
testfile2d_01012022.txt
testfile4d_01012022.txt
testfile9d_01012022.txt
testfile7d_01012022.txt
based on above file names, I given search pattern
$Filename = "testfile*d*".
But final result getting like below
testfiled_01012022.txt
testfiled_01012022.txt
testfiled_01012022.txt
testfiled_01012022.txt
testfiled_01012022.txt -its removed number after google d
below variables used in script.
my #files = grep!/^\.\.?$/, readdir DIR
$Filename =~ s/\*//g;
Let me know how to show available files with count.
You say that you are using this "search pattern".
$Filename = "testfile*d*"
What that looks like is a shell filename pattern, like if you were to say ls testfile*d* in the shell. That's not how regular expressions work.
If you want a pattern that matches "testfile" and then some arbitrary string and then "d" and then some other arbitrary string, you would write that as
testfile.*d.*
Second, you're not using that pattern correctly. Your code shows:
my #files = grep!/^\.\.?$/, readdir DIR
$Filename =~ s/\*//g;
The first line does read a list of files, but the second line doesn't do anything useful. What you want your second line to be is:
my #matched_files = grep /testfile.*d.*/, #files;
When you do that, #matched_files will have all the files that match that regular expression.
HOWEVER, you don't have to use regular expressions at all to get a list of files. You can use the built-in glob function that DOES handle shell globbing like you want, and you don't need that readdir stuff.
my #files = glob( 'testfile*d*' ); # Note it uses shell glob syntax, not regex

Perl: Select Filepath for csv file inside folder

I have a Perl Script which does some data manipulation with a selected CSV file. In the past, I have renamed the CSV file to match the one specified inside my script.
I now want to change it so that the sole file in a folder is selected, but the csv file is not always named the same. There will only ever be a single file in the folder.
I currently use this method;
my $filepath_in = 'C:\delete_csv_files\files_new\input.csv';
my $filepath_out = 'C:\delete_csv_files\files_processed\output.csv';
open my $in, '<:encoding(utf8)', $filepath_in or die;
open my $out, '>:encoding(utf8)', $filepath_out or die;
I also want the file to retain its original name after its been processed.
Can anyone give me any pointers?
As suggested by toolic and commented by ikegami, you can use glob.
my ($filepath_in) = glob 'C:\delete_csv_files\files_new\*';
Then you can use a regex to generate the name of the output file, like :
(my $filepath_out = $filepath_in) =~ s!\\files_new\\!\\files_processed\\!;
This will give you a file with the same name, in directory files_processed.
If you want to force the name of the ouput file to output.csv like in your code snippet, then use this regex instead :
(my $filepath_out = $filepath_in) =~ s!\\files_new\\.*$!\\files_processed\\output.csv!;

Get the path for a similarly named file in perl, where only the extension differs?

I'm trying to write an Automator service, so I can chuck this into a right-click menu in the gui.
I have a filepath to a txt file, and there is a similarly named file that varies only in the file extension. This can be a pdf or a jpg, or potentially any other extension, no way to know beforehand. How can I get the filepath to this other file (there will only be one such)?
$other_name =~ s/txt$/!(txt)/;
$other_name =~ s/ /?/g;
my #test = glob "$other_name";
In Bash, I'd just turn on the extglob option, and change the "txt" at the end to "!(txt)" and the do glob expansion. But I'm not even sure if that's available in perl. And since the filepaths always have spaces (it's in one of the near-root directory names), that further complicates things. I've read through the glob() documentation at http://perldoc.perl.org/functions/glob.html and tried every variation of quoting (the above example code shows my attempt after having given up, where I just remove all the spaces entirely).
It seems like I'm able to put modules inside the script, so this doesn't have to be bare perl (just ran a test).
Is there an elegant or at least simple way to accomplish this?
You can extract everything in the filename up to extension, then run a glob with that and filter out the unneeded .txt. This is one of those cases where you need to protect the pattern in the glob with a double set of quotes, for spaces.
use warnings;
use strict;
use feature qw(say);
my $file = "dir with space/file with spaces.txt";
# Pull the full name without extension
my ($basefname) = $file =~ m/(.*)\.txt$/;
# Get all files with that name and filter out unneeded (txt)
my #other_exts = grep { not /\.txt$/ } glob(qq{"$basefname.*"});
say for #other_exts;
With a toy structure like this
dir space/
file with spaces.pdf
file with spaces.txt
The output is
dir space/file with spaces.pdf
This recent post has more on related globs.
Perl doesn't allow the not substring construct in glob. You have to find all files with the same name and any extension, and remove the one ending with .txt
This program shows the idea. It splits the original file name into a stem part and a suffix part, and uses the stem to form a glob pattern. The grep removes any result that ends with the original suffix
It picks only the first matching file name if there is more than one candidate. $other_name will be set to undef if no matching file was found
The original file name is expected as a parameter on the command line
The result is printed to STDOUT; I don't know what you need for your right-click menu
The line use File::Glob ':bsd_glob' is necessary if you are working with file paths that contain spaces, as it seems you are
use strict;
use warnings 'all';
use File::Glob ':bsd_glob';
my ($stem, $suffix) = shift =~ /(.*)(\..*)/;
my ($other_name) = grep ! /$suffix$/i, glob "$stem.*";
$other_name =~ tr/ /?/;
print $other_name, "\n";
This is an example, based on File::Basename core module
use File::Basename;
my $fullname = "/path/to/my/filename.txt";
my ($name, $path, $suffix) = fileparse($fullname, qw/.txt/);
my $new_filename = $path . $name . ".pdf";
# $name --> filename
# $path --> /path/to/my/
# $suffix --> .txt
# $new_filename --> /path/to/my/filename.pdf

Extracting and replacing filename extensions in Perl

Before I start off, I'd like to let you know that I'm no Perl expert. I'm just starting out because of some specific tasks assigned to me.
The requirement of this task is to extract the extension of the file (.dat) and replace it with .trg. The problem is we are zipping the .dat file to make $filename.dat.gz and when we extract the extension and replace it we get $filename.dat.trg while want we would ideally want is $filename.trg.
As for the code (mind you, this seems to be a very old 'legacy' code and I don't want to tinker with it too much as it was/is being maintained by another person), this is how it is put down
#prepare the trigger file
#get the extension
my #contains_extension = split (/\./ , $filename);
my $ext = $contains_extension[-1];
#replace with a ".trg" extension
my $remote_trgfile = $filename;
$remote_trgfile =~ s/$ext$/trg/;
my $trgfile = $out;
$trgfile =~ s/$ext$/trg/;
Remember $filename in the above code is suffixed with .dat.gz i.e., the filename is $filename.dat.gz
I would appreciate if someone could help me out with an easier way to extract both the extensions (.dat and .gz) and replacing it with .trg
So you want to change the 'extension' of a filename, including a optional .gz? Try:
$filename =~ s{\.[^.]*(?:\.gz)?$}{.trg}
try
$filename =~ s/\..*$/.trg/;
no need to do all of that fancy splitting stuff to try and capture the extension :)
breakdown:
. matches .
.* matches everything (except newline)
$ matches the end of the string
then you're just replacing that with .trg
so you're basically just taking everything after the first "." and replacing with .trg
hope that helps :)

How do I skip .svn folders with File::Find?

I am writing a Perl script that is iterating over file names in a directory and its sub-directories, using the following method:
find(\&getFile, $mainDir);
sub getFile {
my $file_dir = $File::Find::name;
return unless -f $file_dir; # return if its a folder
}
The file structure looks like this:
main/classes/pages/filename.php
However because of version control each folder and subfolder has a hidden .svn directory that has duplicates of every file inside with a .svn-base suffix:
main/.svn/classes/pages/filename.php.svn-base
I was wondering if there is a return statement like the one I had previously using:
return if ($file_dir eq "something here");
to skip all the .svn folders to not find filenames with the .svn-base suffix. I have been fiddling around with regex and searching for hours without much luck. I have only been using perl for couple days.
You may use
return if ($file_dir !~ /\.svn/);
(!~ is equivalent to !($file_dir =~ /\.svn/). The =~ operator compares a variable with a pattern.