Using Perl glob with spaces in the pattern - perl

I am trying to zip files from a directory. It works well except when the file name has spaces.
Since glob splits its parameter on spaces, I also tried bsd_glob but it did not work.
How do I handle spaces in the file names? I am seeking to retrieve all files.
#Directory of focus
my $log = 'C:/Users/me/Desktop/log';
my #files = bsd_glob( $log.'/*.*' );
#Copy contents to new directory to be zipped
foreach my $file (#files) {
copy($file, $logout) or die
"Failed to copy $file: $!\n";
}
Fail to copy
# Create Child tmp
my $out = 'C:/Users/me/Desktop/out';
mkdir $out;
# Directory of focus
my $log = 'C:/Users/me/Desktop/log';
opendir (DIR, $log) or die $!;
while ( my $file = readdir(DIR) ) {
next if $file =~ /^\./;
#print "$file\n";
copy($file, $out) or die "Failed to copy $file: $!\n";
}
closedir (DIR);

There isn't any conflict in your code, as spaces won't matter in the files that glob finds, only in the pattern that you pass to it as a parameter. I notice that you write in a comment on Matt Jacob's post
I'm sorry, the process works. Thank you! Apparently the file is opened elsewhere
so I imagine that that was the problem all along. But I thought it would be useful if I explained how to get glob to cope with a pattern that contains spaces
Behaviour of glob with spaces
I would write
my #files = glob "$log/*.*"
because I think it is clearer, but the string you're passing to glob is C:/Users/me/Desktop/log/*.* which has no spaces, so glob is fine
If you had a space in the path somewhere then you're right - glob would split at those spaces and treat each part as a separate parameter. Say you had
my #files = glob "C:/Program Files/*"
then you would get the list ('C:/Program') because glob checks whether a file exists only if there is a wildcard in the pattern. So we get back the first part C:/Program which doesn't have a wildcard, but the second part contributes nothing more because there are no files matching Files/*
Solution using quotes
The solution in this case is to wrap patterns that contain spaces in a pair of quotation marks - either single or double. So either of
my #files = glob "'C:/Program Files/*'"
or
my #files = glob '"C:/Program Files/*"'
will work fine. But if you want to interpolate a path like your C:/Users/me/Desktop/out then the outermost quotes must be double quotes. In your case that would look like
my $log = 'C:/Users/me/Desktop/log';
my #files = glob "'$log/*.*'";
but I prefer to use the alternative qq operator like this
my $log = 'C:/Users/me/Desktop/log';
my #files = glob qq{"$log/*.*"};
Solution using bsd_glob
The alternative, as you point out in your question, is to add
use File::Glob 'bsd_glob'
to the top of your code and use the bsd_glob function instead, which treats spaces in the pattern the same as any other character and doesn't split on them.
Or if you have
use File::Glob ':bsd_glob'
(note the additional colon) then the standard glob call will behave the same way as bsd_glob, which allows you to use the angle bracket form of glob like this
my #files = <C:/Program Files/*>
without any problems

Don't use glob. Use readdir instead (or File::Find if you need recursion).
opendir (my $dh, $log) or die $!;
while (my $file = readdir($dh)) {
next if $file =~ /^\./;
copy("$log/$file", $out) or die "Failed to copy $file: $!\n";
}
closedir($dh);

Related

Unable to open files returned by readdir in Perl [duplicate]

This question already has answers here:
Why can't I open files returned by Perl's readdir?
(2 answers)
Closed 7 years ago.
I have a problem with a Perl script, as follows.
I must open and analyze all the *.txt files in a directory, but I cannot.
I can read file names that are saved in the #files array and printed, but I cannot open those files for reading.
This is my code:
my $dir= "../Scrivania/programmi" ;
opendir my ($dh), $dir;
my #files = grep { -f and /\.txt/i } readdir $dir;
closedir $dh;
for my $file ( #files ) {
$file = catfile($dir, $file);
print qq{Opening "$file"\n};
open my $fh, '<', $file;
# Do stuff with the data from $fh
print "sono nel foreach\n";
print " in : "."$fh\n";
#open(CANALI,$fh);
##righe=<CANALI>;
#close(CANALI);
#print "canali:"."#righe\n";
#foreach $canali (#righe)
#{
# $canali =~ /\d\d:\d\d (-) (.*)/;
# $ora= $1;
#
# if($hhSplit[0] == $ora)
# {
# push(#output, "$canali");
#
# }
#}
}
The main problem you have is that the file names returned by readdir have no path, so you're trying to open, say, x.txt when you should be opening ../Sc/direct/x.txt. The file doesn't exist in the current working directory so your open call fails
You also have a strange mixture of stuff in glob("$dir/(.*).txt/") which looks a little like a regex pattern, which glob doesn't understand. The value of $dir is a directory handle left open from the opendir on the first line. What you should be using is glob '../Sc/direct/*.txt', but then there's no need for the readdir
There are two ways to find the contents of a file. You can use opendir and readdir to read everything in the directory, or you can use glob
The first method returns only the bare name of each entry, which means you must concatenate each name with the path to the containing directory, preferably using catfile from File::Spec::Functions. It also includes the pseudo-directories . and .. so you must filter those out before you can use the list of names
glob has neither of these disadvantages. All the strings it returns are real directory entries, and they will include a path if you provided one in the pattern you passed as a parameter
You seem to have become rather muddled over the two, so I have written this program which differentiates between the two approaches. I hope it makes things clearer
use strict;
use warnings;
use v5.10.1;
use autodie;
use File::Spec::Functions qw/ catfile /;
my $dir = '../Sc/direct';
### Using glob
for my $file ( glob catfile($dir, '*.txt') ) {
print qq{Opening "$file"\n};
open my $fh, '<', $file;
# Do stuff with the data from $fh
}
### Using opendir / readdir
opendir my ($dh), $dir;
my #files = grep { -f and /\.txt$/i } readdir $dir;
closedir $dh;
for my $file ( #files ) {
$file = catfile($dir, $file);
print qq{Opening "$file"\n};
open my $fh, '<', $file;
# Do stuff with the data from $fh
}
Using $dir in the glob is incorrect. $dir is a GLOB type not a string value. Rather you should be looping over the #files array and looking for names that match what you want. Maybe something like so:
foreach my $fp (#files) {
if ($fp =~ /(.*).txt/) {
print "$fp is a .txt\n";
open (my $in, "<", $fp)
while (<$in>) ...
}
}

Perl: can't get time stamps from files in directory --> Use of uninitialized value in line 18

My goal: list the *gz files in a directory with name and creation date.
I wrote the following
#!/usr/bin/perl
use strict;
use warnings;
use File::stat;
use Time::localtime;
my $directory = '/home/hans/.config/cqrlog/database';
opendir (DIR, $directory) or die $!;
my #files = (readdir(DIR));
closedir(DIR);
foreach $_ (#files) {
# Use a regular expression to find files ending with .gz
if ($_ =~ m/\.gz$/) {
my $file_name = $_;
my $file_time = (stat($_))[9];
print "$file_time\n";
}
}
But I do keep getting the often seen error "Use of uninitialized value $file_time in concatenation (.) or string at ./perl-matching-files.pl line 18." which is the print line.
I also tried the following:
foreach $_ (#files) {
# Use a regular expression to find files ending with .gz
if ($_ =~ m/\.gz$/) {
my $file_name = $_;
my #file_time_array = (stat($_));
my $file_time = $file_time_array[9];
print $file_name , " - " , $file_time , "\n";
}
}
But again it barfs at the last print line. I also tried a while-loop, but wit the same results. The file names are printed out, though, so I must be doing something right. I feel that when reading through the array the time stamp of the file is not read, but I am not that much of an expert to know what is going wrong. It seems to always come down to the print line. Any insight is appreciated. Cheers.
Instead of
my $file_time = (stat($_))[9];
try
my $file_time = (stat("$directory/$_"))[9];
otherwise you're looking for /home/hans/.config/cqrlog/database files in the current directory which could work ONLY if you're already in mentioned directory.
stat returns the empty list if stat fails. Therefore consider test the error code, especially when facing a problem like you were:
my $st = stat($_) or die "No $_: $!";
This would've returned:
No <filename.gz>: No such file or directory at ...
As mpapec already pointed out, this is because you aren't including the path information in the stat call. There are three possible solutions:
1) chdir to the directory your iterating over
chdir $directory;
2) Use a glob instead of readdir
#!/usr/bin/perl
use strict;
use warnings;
my $directory = '/home/hans/.config/cqrlog/database';
for my $file_name (glob("$directory/*.gz")) {
my $st = stat($file_name) or die "No $file_name: $!";
my $file_time = $st->[9];
print "$file_time\n";
}
3) Or manually add the path to the fqfn
my #file_time_array = stat("$directory/$_") or die "No $_: $!";
Thank you guys. After two days I got it figured out.
You were both right about the path not being specified enough. Fixed that.
Miller: the glob thing worked after I added use File::stat. I never worked with globs, so thanks for steering me in that direction. Learned a lot from it. Cheers.
In the end I tried the OOP interface for stat after fiddling for an hour with single file examples:
my $file_time = stat("$directory/$file_name")->mtime;
This got me what I wanted, so I tried the same method with the array element number:
my $file_time = (stat("$file_name"))->[9] or die "No $_: $!";
This also worked. So it all came down to adding "->"
This is my final code that works. I know it can be prettier/better/more efficient, but for now it is fine with me, because I wrote it myself. Time to get on with some additions because it is going to be a script only run on my own machine to handle some automation tasks.
#!/usr/bin/perl
use strict;
use warnings;
use File::stat;
use Time::localtime;
my $directory = '/home/hans/.config/cqrlog/database';
opendir (DIR, $directory) or die $!;
my #files = (readdir(DIR));
closedir(DIR);
foreach $_ (#files) {
# Use a regular expression to find files ending with .gz
if ($_ =~ m/\.gz$/) {
# my $file_time = stat("$directory/$_")->mtime;
my $file_time = (stat("$directory/$_"))->[9] or die "No $_: $!";
print "$_\n";
print "$file_time\n";
}
}

List content of a directory except hidden files in Perl

My code displays all files within the directory, But I need it not to display hidden files such as "." and "..".
opendir(D, "/var/spool/postfix/hold/") || die "Can't open directory: $!\n";
while (my $f = readdir(D))
{
print "MailID :$f\n";
}
closedir(D);
It sounds as though you might be wanting to use the glob function rather than readdir:
while (my $f = </var/spool/postfix/hold/*>) {
print "MailID: $f\n";
}
<...> is an alternate way of globbing, you can also just use the function directly:
while (my $f = glob "/var/spool/postfix/hold/*") {
This will automatically skip the hidden files.
Just skip the files you don't want to see:
while (my $f = readdir(D))
{
next if $f eq '.' or $f eq '..';
print "MailID :$f\n";
}
On a Linux system, "hidden" files and folders are those starting with a dot.
It is best to use lexical directory handles (and file handles).
It is also important to always use strict and use warnings at the start of every Perl program you write.
This short program uses a regular expression to check whether each name starts with a dot.
use strict;
use warnings;
opendir my $dh, '/var/spool/postfix/hold' or die "Can't open directory: $!\n";
while ( my $node = readdir($dh) ) {
next if $node =~ /^\./;
print "MailID: $node\n";
}

Perl script to make first 8 characters all caps but not the whole file name

What Perl script should I be using to only change the first 8 characters in a file name to all caps instead of the script changing the entire file name to all caps?
Here is how I am setting it up:
#!/usr/bin/perl
chdir "directory path";
##files = `ls *mw`;
#files = `ls | grep mw`;
chomp #files;
foreach $oldname (#files) {
$newname = $oldname;
$newname =~ s/mw//;
print "$oldname -> $newname\n";
rename("$oldname","$newname");
}
You can use this regex:
my $str = 'Hello World!';
$str =~ s/^(.{8})/uc($1)/se; # $str now contains 'HELLO WOrld!'
The substitution
s/^(.{1,8})/\U$1/
will set the first eight characters of a string to upper case. The complete program looks like this
use strict;
use warnings;
chdir "directory path" or die "Unable to change current directory: $!";
opendir my $dh, '.' or die $!;
my #files = grep -f && /mw/, readdir $dh;
foreach my $file (#files) {
(my $new = $file) =~ s/mw//;
$new =~ s/^(.{1,8})/\U$1/s;
print "$file -> $new\n";
rename $file, $new;
}
How about:
#!/usr/bin/perl
use strict;
use warnings;
use File::Copy;
chdir'/path/to/directory';
# Find all files that contain 'mw'
my #files = glob("*mw*");
foreach my $file(#files) {
# skip directories
next if -d $file;
# remve 'mw' from the filename
(my $FILE = $file) =~ s/mw//;
# Change filename to uppercase even if the length is <= 8 char
$FILE =~ s/^(.{1,8})/uc $1/se;
move($file, $FILE);
}
As said in the doc for rename, you'd better use File::Copy to be platform independent.
Always check return values of system calls!
When you make any call to OS services, you should always check the return value. For example, the Perl documentation for chdir is (with added emphasis)
chdir EXPR
chdir FILEHANDLE
chdir DIRHANDLE
chdir
Changes the working directory to EXPR, if possible. If EXPR is omitted, changes to the directory specified by $ENV{HOME}, if set; if not, changes to the directory specified by $ENV{LOGDIR}. (Under VMS, the variable $ENV{SYS$LOGIN} is also checked, and used if it is set.) If neither is set, chdir does nothing. It returns true on success, false otherwise. See the example under die.
On systems that support fchdir(2), you may pass a filehandle or directory handle as the argument. On systems that don't support fchdir(2), passing handles raises an exception.
As written in your question, your code discards important information: whether system calls chdir and rename succeeded or failed.
Providing useful error messages
An example of a common idiom for checking return values in Perl is
chdir $path or die "$0: chdir $path: $!";
The error message contains three important bits of information:
the program emitting the error, $0
what it was trying to do, chdir in this case
why it failed, $!
Also note that die also the name of the file and line number where program control was if your error message does not end with newline. When the chdir fails, the standard error will resemble
./myprogram: chdir: No such file or directory at ./myprogram line 3.
Logical or is true when at least one of its arguments is true. The “do something or die” idiom works because if chdir above fails, it returns a false value and requires or to evaluate the right-hand side and terminates execution with die. In the happy case where chdir succeeds and returns a true value, there is no need to evaluate the right-hand side because we already have one true argument to logical or.
Suggested improvements to your code
For what you’re doing, I recommend using readdir to avoid problems in case one of the filenames contains whitespace. Note the defined test in the code below that’s there to stop a file named 0 (i.e., a single zero character) terminating your loop.
#! /usr/bin/env perl
chdir "directory path" or die "$0: chdir: $!";
opendir $dh, "." or die "$0: opendir: $!";
while (defined($oldname = readdir $dh)) {
next unless ($newname = $oldname) =~ s/mw//;
$newname =~ s/^(.{1,8})/\U$1/;
rename $oldname, $newname or die "$0: rename $oldname, $newname: $!";
}
For the rename to have any hope, you have to preserve the value of $oldname, so right away, the code above copies it to $newname and starts changing the copy rather than the original. You will see
($new = $old) =~ s/.../.../; # or /.../
in Perl code, so it is also an important idiom to understand.
The perlop documentation defines handy escape sequences for use in strings and regex substitutions:
\l lowercase next character only
\u titlecase (not uppercase!) next character only
\L lowercase all characters till \E seen
\U uppercase all characters till \E seen
\Q quote non-word characters till \E
\E end either case modification or quoted section (whichever was last seen)
The code above grabs the first eight characters (or fewer if $newname is shorter in length) and replaces them with their upcased counterparts.
Example output
See the code in action:
$ ls directory\ path/
defmwghijk mwabc nochange qrstuvwxyzmw
$ ./prog
$ ls directory\ path/
ABC DEFGHIJK QRSTUVWXyz nochange
I figure there's more to your requirements than you're telling us, such as not uppercasing parts of the file extension. Instead of matching the first eight characters, I'll match the first eight letters:
use v5.14;
use utf8;
chdir "/Users/brian/test/";
my #files = glob( 'mw*' );
foreach my $old (#files) {
my $new = $old =~ s/\Amw(\pL{1,8})/\U$1/ir;
print "$old → $new\n";
}
Some other notes:
You can do the glob directly in Perl. You don't need ls.
It looks like you were stripping off mv, so I did that. If that's not what you want, it's easy to change.
In lieu of a regular expression to up-case the first eight characters you could use the 4-argument form of substr. This offers in situ replacement.
my $old = q(abcdefghij);
my $new = $old;
substr( $new, 0, 8, substr( uc($old), 0, 8 ) );
print "$old\n$new\n";
abcdefghij
ABCDEFGHij
Use rename or File::Copy::move (as M42 showed) to perform the actual rename.

Using Perl to rename files in a directory

I'd like to take a directory and for all email (*.msg) files, remove the 'RE ' at the beginning. I have the following code but the rename fails.
opendir(DIR, 'emails') or die "Cannot open directory";
#files = readdir(DIR);
closedir(DIR);
for (#files){
next if $_ !~ m/^RE .+msg$/;
$old = $_;
s/RE //;
rename($old, $_) or print "Error renaming: $old\n";
}
If your ./emails directory contains these files:
1.msg
2.msg
3.msg
then your #files will look something like ('.', '..', '1.msg', '2.msg', '3.msg') but your rename wants names like 'emails/1.msg', 'emails/2.msg', etc. So you can chdir before renaming:
chdir('emails');
for (#files) {
#...
}
You'd probably want to check the chdir return value too.
Or add the directory names yourself:
rename('emails/' . $old, 'emails/' . $_) or print "Error renaming $old: $!\n";
# or rename("emails/$old", "emails/$_") if you like string interpolation
# or you could use map if you like map
You might want to combine your directory reading and filtering using grep:
my #files = grep { /^RE .+msg$/ } readdir(DIR);
or even this:
opendir(DIR, 'emails') or die "Cannot open directory";
for (grep { /^RE .+msg$/ } readdir(DIR)) {
(my $new = $_) =~ s/^RE //;
rename("emails/$_", "emails/$new") or print "Error renaming $_ to $new: $!\n";
}
closedir(DIR);
You seem to be assuming glob-like behavior rather than than readdir-like behavior.
The underlying readdir system call returns just the filenames within the directory, and will include two entries . and ... This carries through to the readdir function in Perl, just to give a bit more detail on mu's answer.
Alternately, there's not much point to using readdir if you're collecting all the results in an array anyways.
#files = glob('emails/*');
As already mentioned, your script fails because of the path you expect and the script uses are not the same.
I would suggest a more transparent usage. Hardcoding a directory is not a good idea, IMO. As I learned one day when I made a script to alter some original files, with the hardcoded path, and a colleague of mine thought this would be a nice script to borrow to alter his copies. Ooops!
Usage:
perl script.pl "^RE " *.msg
i.e. regex, then a file glob list, where the path is denoted in relation to the script, e.g. *.msg, emails/*.msg or even /home/pat/emails/*.msg /home/foo/*.msg. (multiple globs possible)
Using the absolute paths will leave the user with no doubt as to which files he'll be affecting, and it will also make the script reusable.
Code:
use strict;
use warnings;
use v5.10;
use File::Copy qw(move);
my $rx = shift; # e.g. "^RE "
if ($ENV{OS} =~ /^Windows/) { # Patch for Windows' lack of shell globbing
#ARGV = map glob, #ARGV;
}
for (#ARGV) {
if (/$rx/) {
my $new = s/$rx//r; # Using non-destructive substitution
say "Moving $_ to $new ...";
move($_, $new) or die $!;
}
}
I don't know if the regex fits the specifig name of the files, but in one line this could be done with:
perl -E'for (</path/to/emails*.*>){ ($new = $_) =~ s/(^RE)(.*$)/$2/; say $_." -> ".$new}
(say ... is nice for testing, just replace it with rename $_,$new or rename($_,$new) )
<*.*> read every file in the current directory
($new = $_) =~ saves the following substitution in $new and leaves $_ as intact
(^RE) save this match in $1 (optional) and just match files with "RE" at the beginning
(.*$) save everything until and including the end ($) of the line -> into $2
substitute the match with the string in$2