Perl File Name Change - perl

I am studying and extending a Perl script written by others. It has a line:
#pub=`ls $sourceDir | grep '\.htm' | grep -v Default | head -550`;
foreach (#pub) {
my $docName = $_;
chomp($docName);
$docName =~ s/\.htm$//g;
............}
I know that it uses a UNIX command firstly to take out all the htm files, then get rid of file extension.
Now I need to do one thing, which is also very important. That is, I need to change the file name of the actual files stored, by replacing the white space with underscore. I am stuck here because I am not sure whether I should follow his code style, achieving this by using UNIX, or I should do this in Perl? The point is that I need to modify the real file on the disk, not the string which used to hold the file name.
Thanks.

Something like this should help (not tested)
use File::Basename;
use File::Spec;
use File::Copy;
use strict;
my #files = grep { ! /Default/ } glob("$sourceDir/*.htm");
# I didn't implement the "head -550" part as I don't understand the point.
# But you can easily do it using `splice()` function.
foreach my $file (#files) {
next unless (-f $file); # Don't rename directories!
my $dirname = dirname($file); # file's directory, so we rename only the file itself.
my $file_name = basename($file); # File name fore renaming.
my $new_file_name = $file_name;
$new_file_name =~ s/ /_/g; # replace all spaces with underscores
rename($file, File::Spec->catfile($dirname, $new_file_name))
or die $!; # Error handling - what if we couldn't rename?
}

It will be faster to use File::Copy to move the file to its new name rather than using this method which forks off a new process, spawns a new shell, etc. it takes more memory and is slower than doing it within perl itself.
edit.. you can get rid of all that backtick b.s., too, like this
my #files = grep {!/Default/} glob "$sourcedir/*.html";

Related

Using chop in grep expression

My Perl script searches a directory of file names, using grep to output only file names without the numbers 2-9 in their names. That means, as intended, that file names ending with the number "1" will also be returned. However, I want to use the chop function to output these file names without the "1", but can't figure out how. Perhaps the grep and chop functions can be combined in one line of code to achieve this? Please advise. Thanks.
Here's my Perl script:
#!/usr/bin/perl
use strict;
use warnings;
my $dir = '/Users/jdm/Desktop/xampp/htdocs/cnc/images/plants';
opendir(DIR, $dir);
#files = grep (/^[^2-9]*\.png\z/,readdir(DIR));
foreach $file (#files) {
print "$file\n";
}
Here's the output:
Ilex_verticillata.png
Asarum_canadense1.png
Ageratina_altissima.png
Lonicera_maackii.png
Chelone_obliqua1.png
Here's my desired output with the number "1" removed from the end of file names:
Ilex_verticillata.png
Asarum_canadense.png
Ageratina_altissima.png
Lonicera_maackii.png
Chelone_obliqua.png
The number 1 to remove is at the end of the name before the extension; this is different from filtering on numbers (2-9) altogether and I wouldn't try to fit it into one operation.
Instead, once you have your filtered list (no 2-9 in names), then clip off that 1. Seeing that all names of interest are .png can simply use a regex
$filename =~ s/1\.png\z/.png/;
and if there is no 1 right before .png the string is unchanged. If it were possible to have other extensions involved then you should use a module to break up the filename.
To incorporate this, you can pass grep's output through a map
opendir my $dfh, $dir or die "Can't open $dir: $!";
my #files =
map { s/1\.png\z/.png/r }
grep { /^[^2-9]*\.png\z/ }
readdir $dfh;
where I've also introduced a lexical directory filehandle instead of a glob, and added a check on whether opendir worked. The /r modifier on the substitution in map is needed so that the string is returned (changed or unchanged if regex didn't match), and not changed in place, as needed here.
This passes over the list of filenames twice, though, while one can use a straight loop. In principle that may impact performance; however, here all operations are done on each element of a list so a difference in performance is minimal.
You could use use the following:
s/1//g for #files;
It's also possible to integrate a solution into your chain using map.
my #files =
map s/1//rg,
grep /^[^2-9]*\.png\z/,
readdir(DIR);

create new file listing all text files in a directory with perl

I am trying to list out all text files in a directory using perl. The below does run but the resulting file is empty. This seems close but maybe it is not what I need. Thank you :).
get_list.pl
#!/bin/perl
# create a list of all *.txt files in the current directory
opendir(DIR, ".");
#files = grep(/\..txt$/,readdir(DIR));
closedir(DIR);
# print all the filenames in our array
foreach $file (#files) {
print "$file\n";
}
As written, your grep is wrong:
#files = grep(/\..txt$/,readdir(DIR));
In regular expressions - . means any character. So you will find a file called
fish.mtxt
But not a file called
fish.txt
Because of that dot.
You probably want to grep /\.txt/, readdir(DIR)
But personally, I wouldn't bother, and just use glob instead.
foreach my $file (glob "*.txt") {
print $file,"\n";
}
Also - turn on use strict; use warnings;. Consider them mandatory until you know why you want to turn them off. (There are occasions, but you'll know what they are if you ever REALLY NEED to).
You have one excess dot:
#files = grep(/\..txt$/,readdir(DIR));
should be:
#files = grep(/\.txt$/,readdir(DIR));

Get the path for a similarly named file in perl, where only the extension differs?

I'm trying to write an Automator service, so I can chuck this into a right-click menu in the gui.
I have a filepath to a txt file, and there is a similarly named file that varies only in the file extension. This can be a pdf or a jpg, or potentially any other extension, no way to know beforehand. How can I get the filepath to this other file (there will only be one such)?
$other_name =~ s/txt$/!(txt)/;
$other_name =~ s/ /?/g;
my #test = glob "$other_name";
In Bash, I'd just turn on the extglob option, and change the "txt" at the end to "!(txt)" and the do glob expansion. But I'm not even sure if that's available in perl. And since the filepaths always have spaces (it's in one of the near-root directory names), that further complicates things. I've read through the glob() documentation at http://perldoc.perl.org/functions/glob.html and tried every variation of quoting (the above example code shows my attempt after having given up, where I just remove all the spaces entirely).
It seems like I'm able to put modules inside the script, so this doesn't have to be bare perl (just ran a test).
Is there an elegant or at least simple way to accomplish this?
You can extract everything in the filename up to extension, then run a glob with that and filter out the unneeded .txt. This is one of those cases where you need to protect the pattern in the glob with a double set of quotes, for spaces.
use warnings;
use strict;
use feature qw(say);
my $file = "dir with space/file with spaces.txt";
# Pull the full name without extension
my ($basefname) = $file =~ m/(.*)\.txt$/;
# Get all files with that name and filter out unneeded (txt)
my #other_exts = grep { not /\.txt$/ } glob(qq{"$basefname.*"});
say for #other_exts;
With a toy structure like this
dir space/
file with spaces.pdf
file with spaces.txt
The output is
dir space/file with spaces.pdf
This recent post has more on related globs.
Perl doesn't allow the not substring construct in glob. You have to find all files with the same name and any extension, and remove the one ending with .txt
This program shows the idea. It splits the original file name into a stem part and a suffix part, and uses the stem to form a glob pattern. The grep removes any result that ends with the original suffix
It picks only the first matching file name if there is more than one candidate. $other_name will be set to undef if no matching file was found
The original file name is expected as a parameter on the command line
The result is printed to STDOUT; I don't know what you need for your right-click menu
The line use File::Glob ':bsd_glob' is necessary if you are working with file paths that contain spaces, as it seems you are
use strict;
use warnings 'all';
use File::Glob ':bsd_glob';
my ($stem, $suffix) = shift =~ /(.*)(\..*)/;
my ($other_name) = grep ! /$suffix$/i, glob "$stem.*";
$other_name =~ tr/ /?/;
print $other_name, "\n";
This is an example, based on File::Basename core module
use File::Basename;
my $fullname = "/path/to/my/filename.txt";
my ($name, $path, $suffix) = fileparse($fullname, qw/.txt/);
my $new_filename = $path . $name . ".pdf";
# $name --> filename
# $path --> /path/to/my/
# $suffix --> .txt
# $new_filename --> /path/to/my/filename.pdf

How can I create a new output file for each subfolder under a main folder using perl?

I have 100 subfolder in a main folder. They have difference names. Each subfolder includes a .txt file, which has 10 column. I want to get a new .txt file for each subfolder. Each new .txt file must be in its own folder. That is I will have 2 .txt files (old and new) in each subfolder. I am trying to select the lines starting "ATOM" and some columns 2,6,7 and 8 from each .txt file. My code is the following. It doesn't work correctly. It doesnt create a new .txt file. How can i figure out this problem?
#!/usr/bin/perl
$search_text = "ATOM";
#files = <*/*.txt>;
foreach $file (#files) {
print $file . "\n";
open(DATA, $file);
open(OUT_FILE, ">$file a.txt");
while ($line = <DATA>)
{
#fields = split /\s+/, $line;
if ($line =~ m/$search_text/)
{
print OUT_FILE "$fields[2]\t$fields[6]\t$fields[7]\t$fields[8]\n";
}
}
}
close(OUT_FILE);
To put the output file a.txt into the same directory as the input file, you need to extract the directory name from the input file name, and prepend it to the output file name (a.txt). There are a couple of ways you can do that; probably the simplest is to use dirname() from the standard module File::Basename:
use File::Basename;
my $dir = dirname($file);
open(OUT_FILE, ">", "$dir/a.txt") or die "Failed to open $dir/a.txt: $!";
or you could use File::Spec directly:
use File::Spec;
my ($volume, $dir) = File::Spec->splitpath($file);
my $outname = File::Spec->catpath($volume, $dir, 'a.txt');
open(OUT_FILE, ">", $outname) or die "Failed to open $outname: $!";
or you could just use a regexp substitution:
my $outname = ( $file =~ s![^/]+$!a.txt!r );
open(OUT_FILE, ">", $outname) or die "Failed to open $outname: $!";
Ps. In any case, I'd recommend adopting several good habits that will help you write better Perl scripts:
Always start your scripts with use strict; and use warnings;. Fix any errors and warnings they produce. In particular, declare all your local variables with my to make them lexically scoped.
Check the return value of functions like open(), and abort the script if they fail. (I've done this in my examples above.)
Use the three-argument form of open(), as I also did in my examples above. It's a lot less likely to break if your filenames contain funny characters.
Consider using lexically scoped file handles (open my $out_file, ...) instead of global file handles (open OUT_FILE, ...). I didn't do that in my code snippets above, because I wanted to keep them compatible with the rest of your code, but it would be good practice.
If you're pre-declaring a regular expression, like your $search_text, use qr// instead of a plain string, like this:
my $search_text = qr/ATOM/;
It's slightly more efficient, and the quoting rules for special characters are much saner.
For printing multiple columns from an array, consider using join() and a list slice, as in:
print OUT_FILE join("\t", #fields[2,6,7,8]), "\n";
Finally, if I were you, I'd reconsider my file naming scheme: the output file name a.txt matches your input file name glob *.txt, so your script will likely break if you run it twice in a row.

Is there a simple way to do bulk file text substitution in place?

I've been trying to code a Perl script to substitute some text on all source files of my project. I'm in need of something like:
perl -p -i.bak -e "s/thisgoesout/thisgoesin/gi" *.{cs,aspx,ascx}
But that parses all the files of a directory recursively.
I just started a script:
use File::Find::Rule;
use strict;
my #files = (File::Find::Rule->file()->name('*.cs','*.aspx','*.ascx')->in('.'));
foreach my $f (#files){
if ($f =~ s/thisgoesout/thisgoesin/gi) {
# In-place file editing, or something like that
}
}
But now I'm stuck. Is there a simple way to edit all files in place using Perl?
Please note that I don't need to keep a copy of every modified file; I'm have 'em all subversioned =)
Update: I tried this on Cygwin,
perl -p -i.bak -e "s/thisgoesout/thisgoesin/gi" {*,*/*,*/*/*}.{cs,aspx,ascx
But it looks like my arguments list exploded to the maximum size allowed. In fact, I'm getting very strange errors on Cygwin...
If you assign #ARGV before using *ARGV (aka the diamond <>), $^I/-i will work on those files instead of what was specified on the command line.
use File::Find::Rule;
use strict;
#ARGV = (File::Find::Rule->file()->name('*.cs', '*.aspx', '*.ascx')->in('.'));
$^I = '.bak'; # or set `-i` in the #! line or on the command-line
while (<>) {
s/thisgoesout/thisgoesin/gi;
print;
}
This should do exactly what you want.
If your pattern can span multiple lines, add in a undef $/; before the <> so that Perl operates on a whole file at a time instead of line-by-line.
You may be interested in File::Transaction::Atomic or File::Transaction
The SYNOPSIS for F::T::A looks very similar with what you're trying to do:
# In this example, we wish to replace
# the word 'foo' with the word 'bar' in several files,
# with no risk of ending up with the replacement done
# in some files but not in others.
use File::Transaction::Atomic;
my $ft = File::Transaction::Atomic->new;
eval {
foreach my $file (#list_of_file_names) {
$ft->linewise_rewrite($file, sub {
s#\bfoo\b#bar#g;
});
}
};
if ($#) {
$ft->revert;
die "update aborted: $#";
}
else {
$ft->commit;
}
Couple that with the File::Find you've already written, and you should be good to go.
You can use Tie::File to scalably access large files and change them in place. See the manpage (man 3perl Tie::File).
Change
foreach my $f (#files){
if ($f =~ s/thisgoesout/thisgoesin/gi) {
#inplace file editing, or something like that
}
}
To
foreach my $f (#files){
open my $in, '<', $f;
open my $out, '>', "$f.out";
while (my $line = <$in>){
chomp $line;
$line =~ s/thisgoesout/thisgoesin/gi
print $out "$line\n";
}
}
This assumes that the pattern doesn't span multiple lines. If the pattern might span lines, you'll need to slurp in the file contents. ("slurp" is a pretty common Perl term).
The chomp isn't actually necessary, I've just been bitten by lines that weren't chomped one too many times (if you drop the chomp, change print $out "$line\n"; to print $out $line;).
Likewise, you can change open my $out, '>', "$f.out"; to open my $out, '>', undef; to open a temporary file and then copy that file back over the original when the substitution's done. In fact, and especially if you slurp in the whole file, you can simply make the substitution in memory and then write over the original file. But I've made enough mistakes doing that that I always write to a new file, and verify the contents.
Note, I originally had an if statement in that code. That was most likely wrong. That would have only copied over lines that matched the regular expression "thisgoesout" (replacing it with "thisgoesin" of course) while silently gobbling up the rest.
You could use find:
find . -name '*.{cs,aspx,ascx}' | xargs perl -p -i.bak -e "s/thisgoesout/thisgoesin/gi"
This will list all the filenames recursively, then xargs will read its stdin and run the remainder of the command line with the filenames appended on the end. One nice thing about xargs is it will run the command line more than once if the command line it builds gets too long to run in one go.
Note that I'm not sure whether find completely understands all the shell methods of selecting files, so if the above doesn't work then perhaps try:
find . | grep -E '(cs|aspx|ascx)$' | xargs ...
When using pipelines like this, I like to build up the command line and run each part individually before proceeding, to make sure each program is getting the input it wants. So you could run the part without xargs first to check it.
It just occurred to me that although you didn't say so, you're probably on Windows due to the file suffixes you're looking for. In that case, the above pipeline could be run using Cygwin. It's possible to write a Perl script to do the same thing, as you started to do, but you'll have to do the in-place editing yourself because you can't take advantage of the -i switch in that situation.
Thanks to ephemient on this question and on this answer, I got this:
use File::Find::Rule;
use strict;
sub ReplaceText {
my $regex = shift;
my $replace = shift;
#ARGV = (File::Find::Rule->file()->name('*.cs','*.aspx','*.ascx')->in('.'));
$^I = '.bak';
while (<>) {
s/$regex/$replace->()/gie;
print;
}
}
ReplaceText qr/some(crazy)regexp/, sub { "some $1 text" };
Now I can even loop through a hash containing regexp=>subs entries!