How to handle filenames with spaces? - perl

I use Perl on windows(Active Perl). I have a perl program to glob the files in current folder, and concatenate them all using dos copy command called from within using system()...
When i execute, this gives a dos error saying "The system cannot find the file specified." It's related to the spaces in the filenames I have.
This is the perl code :-
#files = glob "*.mp3";
$outfile = 'final.mp3';
$firsttime = 1;
foreach (#files)
{
if($firsttime == 1)
{
#args = ('copy' ,"/b ","$_","+","$outfile", "$outfile");
system (#args);
#system("copy /b '$_'+$outfile $outfile");
$firsttime = 0;
}
else
{
#args = ('copy' ,"/b ","$outfile","+","$_", "$outfile");
system (#args);
#system("copy /b $outfile+'$_' $outfile");
}
}
glob returns a array of filenames in my current folder, Those file names have spaces in between them, so the array elements have spaces in between. When i use the system(...) to execute my copy command on those array elements using "$_" as shown above, it gives error as above.
I tried couple of ways in which I could call the system(...) but without any success.
I would like to know,
1] How can i get this working on files which have spaces in between them using the code above. How to 'escape' the white space in file names.
2] Any alternative solution in Perl to achieve the same thing. (Simple ones welcome..)

Stop using system() to make a call that can be done with a portable library. Perl has a the File::Copy module, use that instead and you don't have to worry about things like this plus you get much better OS portability.

Your code doesn't add any quotes around the filenames.
Try
"\"$_\""
and
"\"$outfile\""

system is rarely the right answer, use File::Copy;
To concatenate all files:
use File::Copy;
my #in = glob "*.mp3";
my $out = "final.mp3";
open my $outh, ">", $out;
for my $file (#in) {
next if $file eq $out;
copy($file, $outh);
}
close $outh;

Issues may arise when you're trying to access the variable $_ inside an inner block. The safest way, change:
foreach (#files)
to:
foreach $file (#files)
Then do the necessary changes on #args, and escape doublequotes to include them in the string..
#args = ('copy' ,"/b ","\"$file\"","+","$outfile", "$outfile");
...
#args = ('copy' ,"/b ","$outfile","+","\"$file\"", "$outfile");

In windows you can normally put double quotes around the filenames (and/or paths) allowing special chars i.e "long file names".
C:\"my long path\this is a file.mp3"
Edit:
Does this not work?
system("copy /b \"$_\"+$outfile $outfile");
(NOTE THE DOUBLE quotes within the string not single quotes)

$filename =~ s/\ /\ /;
what ever the filename is just use slash to refrence spaces

The built in "rename" command also moves files:
rename $source, $destination; # ...and move
I use this on windows all the time.

Related

Get the path for a similarly named file in perl, where only the extension differs?

I'm trying to write an Automator service, so I can chuck this into a right-click menu in the gui.
I have a filepath to a txt file, and there is a similarly named file that varies only in the file extension. This can be a pdf or a jpg, or potentially any other extension, no way to know beforehand. How can I get the filepath to this other file (there will only be one such)?
$other_name =~ s/txt$/!(txt)/;
$other_name =~ s/ /?/g;
my #test = glob "$other_name";
In Bash, I'd just turn on the extglob option, and change the "txt" at the end to "!(txt)" and the do glob expansion. But I'm not even sure if that's available in perl. And since the filepaths always have spaces (it's in one of the near-root directory names), that further complicates things. I've read through the glob() documentation at http://perldoc.perl.org/functions/glob.html and tried every variation of quoting (the above example code shows my attempt after having given up, where I just remove all the spaces entirely).
It seems like I'm able to put modules inside the script, so this doesn't have to be bare perl (just ran a test).
Is there an elegant or at least simple way to accomplish this?
You can extract everything in the filename up to extension, then run a glob with that and filter out the unneeded .txt. This is one of those cases where you need to protect the pattern in the glob with a double set of quotes, for spaces.
use warnings;
use strict;
use feature qw(say);
my $file = "dir with space/file with spaces.txt";
# Pull the full name without extension
my ($basefname) = $file =~ m/(.*)\.txt$/;
# Get all files with that name and filter out unneeded (txt)
my #other_exts = grep { not /\.txt$/ } glob(qq{"$basefname.*"});
say for #other_exts;
With a toy structure like this
dir space/
file with spaces.pdf
file with spaces.txt
The output is
dir space/file with spaces.pdf
This recent post has more on related globs.
Perl doesn't allow the not substring construct in glob. You have to find all files with the same name and any extension, and remove the one ending with .txt
This program shows the idea. It splits the original file name into a stem part and a suffix part, and uses the stem to form a glob pattern. The grep removes any result that ends with the original suffix
It picks only the first matching file name if there is more than one candidate. $other_name will be set to undef if no matching file was found
The original file name is expected as a parameter on the command line
The result is printed to STDOUT; I don't know what you need for your right-click menu
The line use File::Glob ':bsd_glob' is necessary if you are working with file paths that contain spaces, as it seems you are
use strict;
use warnings 'all';
use File::Glob ':bsd_glob';
my ($stem, $suffix) = shift =~ /(.*)(\..*)/;
my ($other_name) = grep ! /$suffix$/i, glob "$stem.*";
$other_name =~ tr/ /?/;
print $other_name, "\n";
This is an example, based on File::Basename core module
use File::Basename;
my $fullname = "/path/to/my/filename.txt";
my ($name, $path, $suffix) = fileparse($fullname, qw/.txt/);
my $new_filename = $path . $name . ".pdf";
# $name --> filename
# $path --> /path/to/my/
# $suffix --> .txt
# $new_filename --> /path/to/my/filename.pdf

Perl script to rename files with spaces in name, pushd/popd equivalent?

My Linux system mounts some Samba shares, and some files are deposited by Windows users. The names of these files sometimes contain spaces and other undesirable characters. Changing these characters to hyphens - seems like a reasonable solution. Nothing else needs to be changed to handle these cleaned file names.
A couple of questions,
What other characters besides spaces, parenthesis should be translated?
What other file attributes (besides file type (file/dir) and permissions) should be checked?
Does Perl offer a pushd/popd equivalent, or is chdir a reasonable solution to traversing the directory tree?
This is my Perl program
#!/bin/env perl
use strict;
use warnings;
use File::Copy;
#rename files, map characters (not allowed) to allowed characters
#map [\s\(\)] to "-"
my $verbose = 2;
my $pat = "[\\s\\(\\)]";
sub clean {
my ($name) = #_;
my $name2 = $name;
$name2 =~ s/$pat/\-/g;
#skip when unchanged, collision
return $name if (($name eq $name2) || -e $name2); #skip collisions
print "r: $name\n" if ($verbose > 2);
rename($name, $name2);
$name2;
}
sub pDir {
my ($obj) = #_;
return if (!-d $obj);
return if (!opendir(DIR, $obj));
print "p: $obj/\n" if ($verbose > 2);
chdir($obj);
foreach my $dir (readdir DIR) {
next if ($dir =~ /^\.\.?$/); #skip ./, ../
pDir(clean($dir));
}
close(DIR);
chdir("..");
}
sub main {
foreach my $argv (#ARGV) {
print "$argv/\n" if ($verbose > 3);
$argv = clean($argv);
if (-d $argv) { pDir($argv); }
}
}
&main();
These posts are related, but don't really address my questions,
Use quotes: How to handle filenames with spaces? (using other scripts, prefer removing need for quotes)
File::Find perl script to recursively list all filename in directory (yes, but I have other reasons)
URL escaping: Modifying a Perl script which has an error handling spaces in files (not urls)
Quotemeta: How can I safely pass a filename with spaces to an external command in Perl? (not urls)
Here's a different way to think about the problem:
Perl has a built-in rename function. You should use it.
Create a data structure mapping old names to new names. Having this data will allow various sanity checks: for example, you don't want cleaned names stomping over existing files.
Since you aren't processing the directories recursively, you can use glob to good advantage. No need to go through the hassles of opening directories, reading them, filtering out dot-dirs, etc.
Don't invoke subroutines with a leading ampersand (search this issue for more details).
Many Unix-like systems include a Perl-based rename command for quick-and-dirty renaming jobs. It's good to know about even if you don't use it for your current project.
Here's a rough outline:
use strict;
use warnings;
sub main {
# Map the input arguments to oldname-newname pairs.
my #renamings =
map { [$_, cleaned($_)] }
map { -f $_ ? $_ : glob("$_/*") }
#_;
# Sanity checks first.
# - New names should be unique.
# - New should not already exist.
# - ...
# Then rename.
for my $rnm (#renamings){
my ($old, $new) = #$rnm;
rename($old, $new) unless $new eq $old;
}
}
sub cleaned {
# Allowed characters: word characters, hyphens, periods, slashes.
# Adjust as needed.
my $f = shift;
$f =~ s/[^\w\-\.\/]/-/g;
return $f;
}
main(#ARGV);
Don't blame Windows for your problems. Linux is much more lax, and the only character it prohibits from its file names is NUL.
It isn't clear exactly what you are asking. Did you publish your code for a critique, or are you having problems with it?
As for the specific questions you asked,
What other characters besides spaces, parenthesis should be translated?
Windows allows any character in its filenames except for control characters from 0x00 to 0x1F and any of < > \ / * ? |
DEL at 0x7F is fine.
Within the ASCII set, that leaves ! # $ % & ' ( ) + , - . : ; = # [ ] ^ _ ` { } ~
The set of characters you need to translate depends on your reason for doing this. You may want to start by excluding non-ASCII characters, so your code should read something like
$name2 =~ tr/\x21-\x7E/-/c
which will change all non-ASCII characters, spaces and DEL to hyphens. Then you need to go ahead and fix all the ASCII characters that you consider undersirable.
What other file attributes (besides file type (file/dir) and permissions) should be checked?
The answer to this has to be according to your purpose. If you are referring only to whether renaming a file or directory as required is possible, then I suggest that you just let rename itself tell you whether it succeeded. It will return a false value if the operation failed, and the reason will be in $!.
Does Perl offer a pushd/popd equivalent, or is chdir a reasonable solution to traversing the directory tree?
If you want to work with that idiom, then you should take a look at File::pushd, which allows you to temporarily chdir to a new location. A popd is done implicitly at the end of the enclosing block.
I hope this helps. If you have any other specific questions then please make them known by editing your original post.

Can NOT List directory including space using Perl in Windows Platform

In order to list pathes in Windows,I wrote below Perl function(executed under StrawBerry runtime environment).
sub listpath
{
my $path = shift;
my #list = glob "$path/*";
#my #list = <$path/*>;
my #pathes = grep { -d and $_ ne "." and $_ ne ".." } #list;
}
But it can't parse directory including space correctly, for example:
When I issued following code:
listpath("e:/test/test1/test11/test111/test1111/test11111 - Copy");
The function returned an array including two elements:
1: e:/test/test1/test11/test111/test1111/test11111
2: -
I am wondering if glob could parse above space directories. Thanks a lot.
Try bsd_glob instead:
use File::Glob ':glob';
my #list = bsd_glob "$path/*";
Even if the topic has been answered long time ago, I recently encounter the same problem, and a quick search gives me another solution, from perlmonks (last reply):
my $path = shift;
$path =~ s/ /\\ /g;
my #list = glob "$path/*";
But prefer bsd_glob, it supports also a couple of other neat features, such as [] for character class.
The question is about Windows platform, where Bentoy13's solution does not work because the backslash would be mistaken for a path separator.
Here's an option if for whatever reason you don't want to go with bsd_glob: wrap the offensive part of the path in double quotes. This can be one directory name (path\\"to my"\\file.txt) or several directory names ("path\\to my"\\file.txt). Slash instead of backslash usually works, too. Of course, they don't have to include a space, so this here always works:
my #list = glob "\"$path\"/*";
remember, it's a Windows solution. Whether it works under Linux depends on context.

Perl File Name Change

I am studying and extending a Perl script written by others. It has a line:
#pub=`ls $sourceDir | grep '\.htm' | grep -v Default | head -550`;
foreach (#pub) {
my $docName = $_;
chomp($docName);
$docName =~ s/\.htm$//g;
............}
I know that it uses a UNIX command firstly to take out all the htm files, then get rid of file extension.
Now I need to do one thing, which is also very important. That is, I need to change the file name of the actual files stored, by replacing the white space with underscore. I am stuck here because I am not sure whether I should follow his code style, achieving this by using UNIX, or I should do this in Perl? The point is that I need to modify the real file on the disk, not the string which used to hold the file name.
Thanks.
Something like this should help (not tested)
use File::Basename;
use File::Spec;
use File::Copy;
use strict;
my #files = grep { ! /Default/ } glob("$sourceDir/*.htm");
# I didn't implement the "head -550" part as I don't understand the point.
# But you can easily do it using `splice()` function.
foreach my $file (#files) {
next unless (-f $file); # Don't rename directories!
my $dirname = dirname($file); # file's directory, so we rename only the file itself.
my $file_name = basename($file); # File name fore renaming.
my $new_file_name = $file_name;
$new_file_name =~ s/ /_/g; # replace all spaces with underscores
rename($file, File::Spec->catfile($dirname, $new_file_name))
or die $!; # Error handling - what if we couldn't rename?
}
It will be faster to use File::Copy to move the file to its new name rather than using this method which forks off a new process, spawns a new shell, etc. it takes more memory and is slower than doing it within perl itself.
edit.. you can get rid of all that backtick b.s., too, like this
my #files = grep {!/Default/} glob "$sourcedir/*.html";

Is there a simple way to do bulk file text substitution in place?

I've been trying to code a Perl script to substitute some text on all source files of my project. I'm in need of something like:
perl -p -i.bak -e "s/thisgoesout/thisgoesin/gi" *.{cs,aspx,ascx}
But that parses all the files of a directory recursively.
I just started a script:
use File::Find::Rule;
use strict;
my #files = (File::Find::Rule->file()->name('*.cs','*.aspx','*.ascx')->in('.'));
foreach my $f (#files){
if ($f =~ s/thisgoesout/thisgoesin/gi) {
# In-place file editing, or something like that
}
}
But now I'm stuck. Is there a simple way to edit all files in place using Perl?
Please note that I don't need to keep a copy of every modified file; I'm have 'em all subversioned =)
Update: I tried this on Cygwin,
perl -p -i.bak -e "s/thisgoesout/thisgoesin/gi" {*,*/*,*/*/*}.{cs,aspx,ascx
But it looks like my arguments list exploded to the maximum size allowed. In fact, I'm getting very strange errors on Cygwin...
If you assign #ARGV before using *ARGV (aka the diamond <>), $^I/-i will work on those files instead of what was specified on the command line.
use File::Find::Rule;
use strict;
#ARGV = (File::Find::Rule->file()->name('*.cs', '*.aspx', '*.ascx')->in('.'));
$^I = '.bak'; # or set `-i` in the #! line or on the command-line
while (<>) {
s/thisgoesout/thisgoesin/gi;
print;
}
This should do exactly what you want.
If your pattern can span multiple lines, add in a undef $/; before the <> so that Perl operates on a whole file at a time instead of line-by-line.
You may be interested in File::Transaction::Atomic or File::Transaction
The SYNOPSIS for F::T::A looks very similar with what you're trying to do:
# In this example, we wish to replace
# the word 'foo' with the word 'bar' in several files,
# with no risk of ending up with the replacement done
# in some files but not in others.
use File::Transaction::Atomic;
my $ft = File::Transaction::Atomic->new;
eval {
foreach my $file (#list_of_file_names) {
$ft->linewise_rewrite($file, sub {
s#\bfoo\b#bar#g;
});
}
};
if ($#) {
$ft->revert;
die "update aborted: $#";
}
else {
$ft->commit;
}
Couple that with the File::Find you've already written, and you should be good to go.
You can use Tie::File to scalably access large files and change them in place. See the manpage (man 3perl Tie::File).
Change
foreach my $f (#files){
if ($f =~ s/thisgoesout/thisgoesin/gi) {
#inplace file editing, or something like that
}
}
To
foreach my $f (#files){
open my $in, '<', $f;
open my $out, '>', "$f.out";
while (my $line = <$in>){
chomp $line;
$line =~ s/thisgoesout/thisgoesin/gi
print $out "$line\n";
}
}
This assumes that the pattern doesn't span multiple lines. If the pattern might span lines, you'll need to slurp in the file contents. ("slurp" is a pretty common Perl term).
The chomp isn't actually necessary, I've just been bitten by lines that weren't chomped one too many times (if you drop the chomp, change print $out "$line\n"; to print $out $line;).
Likewise, you can change open my $out, '>', "$f.out"; to open my $out, '>', undef; to open a temporary file and then copy that file back over the original when the substitution's done. In fact, and especially if you slurp in the whole file, you can simply make the substitution in memory and then write over the original file. But I've made enough mistakes doing that that I always write to a new file, and verify the contents.
Note, I originally had an if statement in that code. That was most likely wrong. That would have only copied over lines that matched the regular expression "thisgoesout" (replacing it with "thisgoesin" of course) while silently gobbling up the rest.
You could use find:
find . -name '*.{cs,aspx,ascx}' | xargs perl -p -i.bak -e "s/thisgoesout/thisgoesin/gi"
This will list all the filenames recursively, then xargs will read its stdin and run the remainder of the command line with the filenames appended on the end. One nice thing about xargs is it will run the command line more than once if the command line it builds gets too long to run in one go.
Note that I'm not sure whether find completely understands all the shell methods of selecting files, so if the above doesn't work then perhaps try:
find . | grep -E '(cs|aspx|ascx)$' | xargs ...
When using pipelines like this, I like to build up the command line and run each part individually before proceeding, to make sure each program is getting the input it wants. So you could run the part without xargs first to check it.
It just occurred to me that although you didn't say so, you're probably on Windows due to the file suffixes you're looking for. In that case, the above pipeline could be run using Cygwin. It's possible to write a Perl script to do the same thing, as you started to do, but you'll have to do the in-place editing yourself because you can't take advantage of the -i switch in that situation.
Thanks to ephemient on this question and on this answer, I got this:
use File::Find::Rule;
use strict;
sub ReplaceText {
my $regex = shift;
my $replace = shift;
#ARGV = (File::Find::Rule->file()->name('*.cs','*.aspx','*.ascx')->in('.'));
$^I = '.bak';
while (<>) {
s/$regex/$replace->()/gie;
print;
}
}
ReplaceText qr/some(crazy)regexp/, sub { "some $1 text" };
Now I can even loop through a hash containing regexp=>subs entries!