Extracting and replacing filename extensions in Perl - perl

Before I start off, I'd like to let you know that I'm no Perl expert. I'm just starting out because of some specific tasks assigned to me.
The requirement of this task is to extract the extension of the file (.dat) and replace it with .trg. The problem is we are zipping the .dat file to make $filename.dat.gz and when we extract the extension and replace it we get $filename.dat.trg while want we would ideally want is $filename.trg.
As for the code (mind you, this seems to be a very old 'legacy' code and I don't want to tinker with it too much as it was/is being maintained by another person), this is how it is put down
#prepare the trigger file
#get the extension
my #contains_extension = split (/\./ , $filename);
my $ext = $contains_extension[-1];
#replace with a ".trg" extension
my $remote_trgfile = $filename;
$remote_trgfile =~ s/$ext$/trg/;
my $trgfile = $out;
$trgfile =~ s/$ext$/trg/;
Remember $filename in the above code is suffixed with .dat.gz i.e., the filename is $filename.dat.gz
I would appreciate if someone could help me out with an easier way to extract both the extensions (.dat and .gz) and replacing it with .trg

So you want to change the 'extension' of a filename, including a optional .gz? Try:
$filename =~ s{\.[^.]*(?:\.gz)?$}{.trg}

try
$filename =~ s/\..*$/.trg/;
no need to do all of that fancy splitting stuff to try and capture the extension :)
breakdown:
. matches .
.* matches everything (except newline)
$ matches the end of the string
then you're just replacing that with .trg
so you're basically just taking everything after the first "." and replacing with .trg
hope that helps :)

Related

Get the path for a similarly named file in perl, where only the extension differs?

I'm trying to write an Automator service, so I can chuck this into a right-click menu in the gui.
I have a filepath to a txt file, and there is a similarly named file that varies only in the file extension. This can be a pdf or a jpg, or potentially any other extension, no way to know beforehand. How can I get the filepath to this other file (there will only be one such)?
$other_name =~ s/txt$/!(txt)/;
$other_name =~ s/ /?/g;
my #test = glob "$other_name";
In Bash, I'd just turn on the extglob option, and change the "txt" at the end to "!(txt)" and the do glob expansion. But I'm not even sure if that's available in perl. And since the filepaths always have spaces (it's in one of the near-root directory names), that further complicates things. I've read through the glob() documentation at http://perldoc.perl.org/functions/glob.html and tried every variation of quoting (the above example code shows my attempt after having given up, where I just remove all the spaces entirely).
It seems like I'm able to put modules inside the script, so this doesn't have to be bare perl (just ran a test).
Is there an elegant or at least simple way to accomplish this?
You can extract everything in the filename up to extension, then run a glob with that and filter out the unneeded .txt. This is one of those cases where you need to protect the pattern in the glob with a double set of quotes, for spaces.
use warnings;
use strict;
use feature qw(say);
my $file = "dir with space/file with spaces.txt";
# Pull the full name without extension
my ($basefname) = $file =~ m/(.*)\.txt$/;
# Get all files with that name and filter out unneeded (txt)
my #other_exts = grep { not /\.txt$/ } glob(qq{"$basefname.*"});
say for #other_exts;
With a toy structure like this
dir space/
file with spaces.pdf
file with spaces.txt
The output is
dir space/file with spaces.pdf
This recent post has more on related globs.
Perl doesn't allow the not substring construct in glob. You have to find all files with the same name and any extension, and remove the one ending with .txt
This program shows the idea. It splits the original file name into a stem part and a suffix part, and uses the stem to form a glob pattern. The grep removes any result that ends with the original suffix
It picks only the first matching file name if there is more than one candidate. $other_name will be set to undef if no matching file was found
The original file name is expected as a parameter on the command line
The result is printed to STDOUT; I don't know what you need for your right-click menu
The line use File::Glob ':bsd_glob' is necessary if you are working with file paths that contain spaces, as it seems you are
use strict;
use warnings 'all';
use File::Glob ':bsd_glob';
my ($stem, $suffix) = shift =~ /(.*)(\..*)/;
my ($other_name) = grep ! /$suffix$/i, glob "$stem.*";
$other_name =~ tr/ /?/;
print $other_name, "\n";
This is an example, based on File::Basename core module
use File::Basename;
my $fullname = "/path/to/my/filename.txt";
my ($name, $path, $suffix) = fileparse($fullname, qw/.txt/);
my $new_filename = $path . $name . ".pdf";
# $name --> filename
# $path --> /path/to/my/
# $suffix --> .txt
# $new_filename --> /path/to/my/filename.pdf

Perl regex search of file name and extensions against a predefined array

I want to filter out some files from a directory. I am able to grab the files and their extensions recursively, but now what I want to do is to match the file extension and file name with a predefined array of extensions and file names using wildcard search as we use to do in sql.
my #ignore_exts = qw( .vmdk .iso .7z .bundle .wim .hd .vhd .evtx .manifest .lib .mst );
I want to filter out the files which will have extensions like the above one.
e.g. File name is abc.1.149_1041.mst and since the extension .mst is present in #ignore_ext, so I want this to filter out. The extension I am getting is '.1.149_1041.mst'. As in sql I'll do something like select * from <some-table> where extension like '%.mst'. Same thing I want to do in perl.
This is what I am using for grabbing the extension.
my $ext = (fileparse($filepath, '\..*?')) [2];
In order to pull a file extension off a filename this should work:
/^(.*)\.([^.]+)$/
$fileName = $1;
$extension = $2;
This might do the trick for you.
Input: a.b.c.text
$1 will be a.b.c.d
$2 will be text
Basically this will take everything from the start of the line until the last period and group that in the 1st group, and then everything from the last period to the end of the line as group 2
You can see a sample here: http://regex101.com/r/vX3dK1
As for checking whether the extension exists in the array read here: (How can I check if a Perl array contains a particular value?)
if (grep (/^$extension/, #array)) {
print "Extension Found\n";
}
Just turn your list of extensions into a regular expression, and then test against the $filepath.
my #ignore_exts = qw( .vmdk .iso .7z .bundle .wim .hd .vhd .evtx .manifest .lib .mst );
my $ignore_exts_re = '(' . join('|', map quotemeta, #ignore_exts) . ')$';
And then later to compare
if ($filepath =~ $ignore_exts_re) {
print "Ignore $filepath because it ends in $1\n";
next;

Concatenating strings to a path

I'm having some trouble with concatenation on strings. I'm tring to create a path that will become a .txt file, but it always ends up as just ".txt" with no name. It ends up in the right fikder though.
This is what I'm doing:
open TEXT, ">/home/admin/www/build/logs/baseline".$ID."/".$platformName.".txt" or die $!;
So I want to create the file.
"/home/admin/www/build/logs/baseline45/linux.txt"
Where am I messing this up?
Thanks!
You are not messing up in the snippet provided, there are several ways of forming a string including values from variables and you are using one of them (the dot-operator) correctly.
Try checking so that $platformType really contains what you think it, and unless you haven't already; turn warnings (and preferrably strict mode) on.
Turning warnings/strict mode on might give you details of undefined variables which would be helpful in situations such as this (ie. Is $platformType really the name of the variable you are looking for?)
use warnings;
use strict;
Print the value of $platformType before trying to open the file and you will find that it is indeed an empty string (or just containing something weird that would explain the results you are getting).
Step 1:
open TEXT, '>/home/admin/www/build/logs/baseline/' . $ID . '/' . $platformName . '.txt';
Step 2:
open TEXT, ">/home/admin/www/build/logs/baseline/" . $ID . "/" . $platformName . ".txt";
I think like this..........

Questions About Perl Filename Wildcard

I am using perl to address some text files. I want to use perl filename wild card to find all the useful files in a folder and address them one by one, but my there are spaces in the filename. Then I find the filename wildcard cannot address those filenames properly. Here is my code:
my $term = "Epley maneuver";
my #files = <rawdata/*$term*.csv>;
my $infiles;
foreach $infilename (#files) {
if($infilename =~ m/(\d+)_.*\.csv/)
{
$infiles{$infilename} = $1;
print $1."\n";
}
}
The filename are like:
34_Epley maneuver_2012_4_6.csv
33_Epley maneuver_2012_1_3.csv
32_Epley maneuver_2011_10_12.csv
...
They are in a folder named "rawdata".
When I used this for terms that don't contain spaces, like "dizzy", it works well. But when the term contains space, it just stop working. I searched this on Google, but find little useful information.
What happens and how can I do this correctly?
Any help will be good. Thanks a lot.
The glob operator works like the command-line processor. If you write <rawdata/*Epley maneuver*.csv> it will look for files that match rawdata/*Epley or maneuver*.csv
You must put your glob expression in double-quotes:
my #files = <"rawdata/*$term*.csv">

How can download via FTP all files with a current date in their name?

I have a file format which is similar to "IDY03101.200901110500.axf". I have about 25 similar files residing in an ftp repository and want to download all those similar files only for the current date. I have used the following code, which is not working and believe the regular expression is incorrect.
my #ymb_date = "IDY.*\.$year$mon$mday????\.axf";
foreach my $file ( #ymb_date)
{
print STDOUT "Getting file: $file\n";
$ftp->get($file) or warn $ftp->message;
}
Any help appreciated.
EDIT:
I need all the current days file.
using ls -lt | grep "Jan 13" works in the UNIX box but not in the script
What could be a vaild regex in this scenario?
It doesn't look like you're using any regular expression. You're trying to use the literal pattern as the filename to download.
Perhaps you want to use the ls method of Net::FTP to get the list of files then filter them.
foreach my $file ( $ftp->ls ) {
next unless $file =~ m/$regex/;
$ftp->get($file);
}
You might also like the answers that talk about implementing mget for "Net::FTP" at Perlmonks.
Also, I think you want the regex that finds four digits after the date. In Perl, you could write that as \d{4}. The ? is a quantifier in Perl, so four of them in a row don't work.
IDY.*\.$year$mon$mday\d{4}\.axf
I do not think regexes work like that. Though it has been awhile since I did Perl, so I could be way off there. :)
Does your $ftp object have access to an mget() method? If so, maybe try this?
$ftp->mget("IDY*.$year$mon$mday*.axf") or warn $ftp->message;