How do I detect a case-insensitive file system in Perl? - perl

I tried using File::Spec->case_tolerant, but it returns false on HFS+, which is wrong. I suspect it's because File::Spec::Unix always returns false. My current workaround is this function:
my $IS_CASE_INSENSITIVE;
sub _is_case_insensitive {
unless (defined $IS_CASE_INSENSITIVE) {
$IS_CASE_INSENSITIVE = 0;
my ($uc) = glob uc __FILE__;
if ($uc) {
my ($lc) = glob lc __FILE__;
$IS_CASE_INSENSITIVE = 1 if $lc;
}
}
return $IS_CASE_INSENSITIVE;
}
But that's a hack since: 1) on a case-sensitive file system both of those files might exists; and 2) different volumes can have different file systems.

In truth, every directory considered must be checked on its own. This is because, on Unix-like systems, any directory can be a different file system than some other directory. Furthermore, use of glob is not very reliable; from perlport:
Don't count on filename globbing. Use opendir, readdir, and closedir instead.
But I think that #borodin is onto something with the use of -e. So here's a function that uses -e to determine whether the specified directory is on a case-insensitive file system:
my %IS_CASE_INSENSITIVE;
sub is_case_insensitive {
my $dir = shift;
unless (defined $IS_CASE_INSENSITIVE{$dir}) {
$IS_CASE_INSENSITIVE{$dir} = -e uc $dir && -e lc $dir;
}
return $IS_CASE_INSENSITIVE{$dir};
}
You could probably add some heuristics for Windows to just cache the value for the drive letter, since that defines a mount point. And of course, it will fail on case-sensisitve file systems if both uppercase and lowercase variations of the directory exist. But otherwise, unless there is some other way to tell more globally which directories match to which mount points, you have to check for any directory.

I suggest you make use of the core File::Temp module to create a new unique file that has lower-case characters in its name. The file is set to be deleted when the object is destroyed, which is when the subroutine exits if not before.
If the file doesn't exist when access by the upper-cased file name then the filing system is case-sensitive.
If the upper-cased name does exist then we have to check that we haven't happened upon a file whose upper-case version was already there, so we delete the file. If the upper-case entry has now gone then the filing system is case-insensitive.
If the upper-cased name is still there then it is a file that existed before we created the temporary file. We just loop around and create a new temporary file with a different name, although the chances of this happening are absolutely tiny. If you prefer you can minimize this possibility even further by using an outlandish value for SUFFIX. Just be careful that the characters you use are valid on any fileing system that you wish to test.
I've tested this on both Windows 7 and Ubuntu.
use strict;
use warnings;
use 5.010;
use autodie;
use File::Temp ();
printf "File system %s case_insensitive\n", case_insensitive() ? "is" : "isn't";
sub case_insensitive {
while () {
my $tmp = File::Temp->new(
TEMPLATE => 'tempXXXXXX',
SUFFIX => '.tmp',
UNLINK => 1,
);
my $uc_filename = uc $tmp->filename;
return 0 if not -e $uc_filename;
$tmp = undef;
return 1 if not -e $uc_filename;
}
}

Related

simple way to test if a given filename is underneath a directory?

Is there a Perl module (preferably core) that has a function that will tell me if a given filename is inside a directory (or a subdirectory of the directory, recursively)?
For example:
my $f = "/foo/bar/baz";
# prints 1
print is_inside_of($f, "/foo");
# prints 0
print is_inside_of($f, "/foo/asdf");
I could write my own, but there are some complicating factors such as symlinks, relative paths, whether it's OK to examine the filesystem or not, etc. I'd rather not reinvent the wheel.
Path::Tiny is not in core, but it has no non-core dependencies, so is a very quick and easy installation.
use Path::Tiny qw(path);
path("/usr/bin")->subsumes("/usr/bin/perl"); # true
Now, it does this entirely by looking at the file paths (after canonicalizing them), so it may or may not be adequate depending on what sort of behaviour you're expecting in edge cases like symlinks. But for most purposes it should be sufficient. (If you want to take into account hard links, the only way is to search through the entire directory structure and compare inode numbers.)
If you want the function to work for only a filename (without a path) and a path, you can use File::Find:
#!/usr/bin/perl
use warnings;
use strict;
use File::Find;
sub is_inside_of {
my ($file, $path) = #_;
my $found;
find( sub { $found = 1 if $_ eq $file }, $path);
return $found
}
If you don't want to check the filesystem, but only process the path, see File::Spec for some functions that can help you. If you want to process symlinks, though, you can't avoid touching the file system.

PERL: String Replacement on file

I am working on a script to do a string replacement in a file and I will read the variables and values and files from a configuration file and do string replacement.
Here is my logic to do a string replacement.
sub expansion($$$){
my $f = shift(#_) ; # file Name
my $vname = shift(#_) ; # variable name for pattern match
my $value = shift(#_) ; # value to replace
my $n = "$f".".new";
open ( O, "<$f") or print( "Can't open $f file: $!");
open ( N ,">$n" ) or print( "Can't open $n file: $!");
while (<O>)
{
$_ =~ s/$vname/$value/g; #check for pattern
print N "$_" ;
}
close (O);
close (N);
}
In my logic am reading line by line in from input file ($f) for the pattern and writing to a new file ($n) .
Instead of write to a new file is there any way to do a string replacement the original file when I try to do the same it has only empty file with no contents.
Do not. Never, ever1. Don't you dare, Don't even think of, do not use subroutine prototyping. It is horribly broken (that is, it doesn't do what you think it does) and is dangerous.
Now, we got that out of the way:
Yes, you can do what you want. You can open a file as both read and writable by using the mode <+. So far, so good.
However, due to buffering, you cannot use the standard read and write methods to read and write to the file. Instead, you need to use sysread and syswrite.
Then, what you need to do is read the line, use sysseek to go back to the start of where you read, and then write to that spot.
Not only is it very complex to do, but it is full of peril. Let's take a simple example. I have a document, and I want to replace my curly quotes with straight quotes.
$line =~ s/“|”/"/g;
That should work. I'm replacing one character with another. What could go wrong?
If this is a UTF-8 file (what Macs and Linux systems use by default), those curly quotes are two-byte characters and that straight quote is a single byte character. I would be writing back a line that was shorter than the line I read in. My buffer is going to be off.
Back in the days when computer memory and storage were measured in kilobytes, and you serial devices like reel-to-reel tapes, this type of operation was quite common. However, in this age where storage is vast, it's simply not worth the complexity and error prone process that this entails. Stick with reading from one file, and writing to another. Then use unlink and rename to delete the original and to rename the copy to the original's name.
A few more pointers:
Don't print if the file can't be opened. Use die. Otherwise, your program will simply continue on blithely unaware that it is not working. Even better, use the pragma use autodie;, and you won't have to worry about testing whether or not a read/write failed.
Use scalars for file handles.
That is instead of
open OUT, ">my_file.txt";
use
open my $out_fh, ">my_file.txt";
And, it is highly recommended to use the three parameter open:
Use
open my $out_fh, ">", "my_file.txt";
If you aren't, always add use strict; and use warnings;.
In fact, your Perl syntax is a bit ancient. You need to get a book on Modern Perl. Perl originally was written as a hack language to replace shell and awk programming. However, Perl has morphed into a full fledge language that can handle complex data types, object orientation, and large projects. Learning the modern syntax of Perl will help you find errors, and become a better developer.
1. Like all rules, this can be broken, but only if you have a clear and careful understanding what is going on. It's like those shows that say "Don't do this at home. We're professionals."
sub inplace_expansion($$$){
my $f = shift(#_) ; # file Name
my $vname = shift(#_) ; # variable name for pattern match
my $value = shift(#_) ; # value to replace
local #ARGV = ( $f );
local $^I = '';
while (<>)
{
s/\Q$vname/$value/g; #check for pattern
print;
}
}
or, my preference would run closer to this (basically equivalent, changes mostly in formatting, variable names, etc.):
use English;
sub inplace_expansion {
my ( $filename, $pattern, $replacement ) = #_;
local #ARGV = ( $filename ),
$INPLACE_EDIT = '';
while ( <> ) {
s/\Q$pattern/$replacement/g;
print;
}
}
The trick with local basically simulates a command-line script (as one would run with perl -e); for more details, see perldoc perlrun. For more on $^I (aka $INPLACE_EDIT), see perldoc perlvar.
(For the business with \Q (in the s// expression), see perldoc -f quotemeta. This is unrelated to your question, but good to know. Also be aware that passing regex patterns around in variables—as opposed to, e.g., using literal regexes exclusively— can be vulnerable to injection attacks; Perl's built-in taint mode is useful here.)
EDIT: David W. is right about prototypes.

Perl script to rename files with spaces in name, pushd/popd equivalent?

My Linux system mounts some Samba shares, and some files are deposited by Windows users. The names of these files sometimes contain spaces and other undesirable characters. Changing these characters to hyphens - seems like a reasonable solution. Nothing else needs to be changed to handle these cleaned file names.
A couple of questions,
What other characters besides spaces, parenthesis should be translated?
What other file attributes (besides file type (file/dir) and permissions) should be checked?
Does Perl offer a pushd/popd equivalent, or is chdir a reasonable solution to traversing the directory tree?
This is my Perl program
#!/bin/env perl
use strict;
use warnings;
use File::Copy;
#rename files, map characters (not allowed) to allowed characters
#map [\s\(\)] to "-"
my $verbose = 2;
my $pat = "[\\s\\(\\)]";
sub clean {
my ($name) = #_;
my $name2 = $name;
$name2 =~ s/$pat/\-/g;
#skip when unchanged, collision
return $name if (($name eq $name2) || -e $name2); #skip collisions
print "r: $name\n" if ($verbose > 2);
rename($name, $name2);
$name2;
}
sub pDir {
my ($obj) = #_;
return if (!-d $obj);
return if (!opendir(DIR, $obj));
print "p: $obj/\n" if ($verbose > 2);
chdir($obj);
foreach my $dir (readdir DIR) {
next if ($dir =~ /^\.\.?$/); #skip ./, ../
pDir(clean($dir));
}
close(DIR);
chdir("..");
}
sub main {
foreach my $argv (#ARGV) {
print "$argv/\n" if ($verbose > 3);
$argv = clean($argv);
if (-d $argv) { pDir($argv); }
}
}
&main();
These posts are related, but don't really address my questions,
Use quotes: How to handle filenames with spaces? (using other scripts, prefer removing need for quotes)
File::Find perl script to recursively list all filename in directory (yes, but I have other reasons)
URL escaping: Modifying a Perl script which has an error handling spaces in files (not urls)
Quotemeta: How can I safely pass a filename with spaces to an external command in Perl? (not urls)
Here's a different way to think about the problem:
Perl has a built-in rename function. You should use it.
Create a data structure mapping old names to new names. Having this data will allow various sanity checks: for example, you don't want cleaned names stomping over existing files.
Since you aren't processing the directories recursively, you can use glob to good advantage. No need to go through the hassles of opening directories, reading them, filtering out dot-dirs, etc.
Don't invoke subroutines with a leading ampersand (search this issue for more details).
Many Unix-like systems include a Perl-based rename command for quick-and-dirty renaming jobs. It's good to know about even if you don't use it for your current project.
Here's a rough outline:
use strict;
use warnings;
sub main {
# Map the input arguments to oldname-newname pairs.
my #renamings =
map { [$_, cleaned($_)] }
map { -f $_ ? $_ : glob("$_/*") }
#_;
# Sanity checks first.
# - New names should be unique.
# - New should not already exist.
# - ...
# Then rename.
for my $rnm (#renamings){
my ($old, $new) = #$rnm;
rename($old, $new) unless $new eq $old;
}
}
sub cleaned {
# Allowed characters: word characters, hyphens, periods, slashes.
# Adjust as needed.
my $f = shift;
$f =~ s/[^\w\-\.\/]/-/g;
return $f;
}
main(#ARGV);
Don't blame Windows for your problems. Linux is much more lax, and the only character it prohibits from its file names is NUL.
It isn't clear exactly what you are asking. Did you publish your code for a critique, or are you having problems with it?
As for the specific questions you asked,
What other characters besides spaces, parenthesis should be translated?
Windows allows any character in its filenames except for control characters from 0x00 to 0x1F and any of < > \ / * ? |
DEL at 0x7F is fine.
Within the ASCII set, that leaves ! # $ % & ' ( ) + , - . : ; = # [ ] ^ _ ` { } ~
The set of characters you need to translate depends on your reason for doing this. You may want to start by excluding non-ASCII characters, so your code should read something like
$name2 =~ tr/\x21-\x7E/-/c
which will change all non-ASCII characters, spaces and DEL to hyphens. Then you need to go ahead and fix all the ASCII characters that you consider undersirable.
What other file attributes (besides file type (file/dir) and permissions) should be checked?
The answer to this has to be according to your purpose. If you are referring only to whether renaming a file or directory as required is possible, then I suggest that you just let rename itself tell you whether it succeeded. It will return a false value if the operation failed, and the reason will be in $!.
Does Perl offer a pushd/popd equivalent, or is chdir a reasonable solution to traversing the directory tree?
If you want to work with that idiom, then you should take a look at File::pushd, which allows you to temporarily chdir to a new location. A popd is done implicitly at the end of the enclosing block.
I hope this helps. If you have any other specific questions then please make them known by editing your original post.

Copy the last modified Dir from one location to another using Perl

fairly new to perl so this is most likely is not the best code which is why I am posting. I got this to work but was wondering if there is a better way. I do not have the ability to download modules. I am copying the last modified directory in a build folder from one server to another server. The argument allows me to choose which build directory to choose from.
Thanks
#!C:\strawberry\perl
use warnings;
use strict;
use File::Copy::Recursive;
my $NewFolder = `(dir /o-d/ad/b \\\\myserver1.name.com\\builds\\$ARGV[0] | head -1)`;
chomp($NewFolder);
$dir1 = "\\\\myserver1.name.com\\builds\\$ARGV[0]/$NewFolder";
$dir2 = "\\\\myserver2.name.com\\builds\\$ARGV[0]/Backup/$NewFolder";
File::Copy::Recursive::dircopy $dir1, $dir2 or die "Copy failed: $!";
Use forward slashes. It just makes your code easier to read:
$dir1 = "\\\\myserver1.name.com\\builds\\$ARGV[0]/$NewFolder";
vs.
$dir1 = "//myserver1.name.com/builds/$ARGV[0]/$NewFolder";
Also, don't do system calls where Perl can do it. For example, Perl can see the last modification date of a file via the stat. Even better is the File::stat module that makes the stat command so much easier to use.
Don't use #ARGV in your programs. Instead, read the variables from #ARGV into your own variables. It makes your program easier to understand, and your own variables have limited scope while #ARGV is global.
Use modern conventions. Variable names should be in all lower case, and use underscores to separate out words. That is $new_folder vs. $NewFolder. Is this arbitrary? Yes, but it's a convention followed by most Perl developers. It means not wondering if the variable is $newFolder, $NewFolder, or $newfolder because you know by these rules it is $new_folder.
And finally, use autodie; This will kill your program whenever a file operation fails. This turns perl from a check function for errors programming language into a exception checking language. This way, you don't have to worry whether or not you have to check for a failed IO operation.
Here's a completely untested, error ridden example:
use strict;
use warnings;
use autodie;
use File::Copy::Recursive qw(dircopy); #Optional Module
use File::Stat;
use constants {
ORIG_SERVER => '//myserver1.name.com/builds',
TO_SERVER => '//myserver2.name.com/builds',
};
my $from_directory = shift;
#
# Find newest directory
#
opendir my $dir_fh, ORIG_SERVER . "/$from_directory";
my $newest_directory;
while ( my $sub_directory = readdir $dir_fh ) {
next if $sub_directory eq "." or $sub_directory eq "..";
next unless -d $sub_directory;
if ( not defined $newest_directory ) {
$youngest_directory = $sub_directory;
next;
}
my $youngest_directory_stat = stat ORIG_SERVER . "/$directory/$newest_directory";
my $sub_directory_stat = stat ORIG_SERVER . "/$directory/$sub_directory";
if ( $newest_directory_stat->mtime > $sub_directory_stat->mtime ) {
$newest_directory = $sub_directory;
}
}
dircopy ORIG_SERVER . "/$directory/$youngest_directory",
TO_SERVER . "/$directory/$youngest_directory/backup";
My program is a lot longer than your program because your program depended upon various system operating commands, like dir and head which I don't believe is a standard Windows OS command. Instead, I read each entry under that directory into my loop. Anything that's not a directory, I toss (next if -d $sub_directory) and I toss out the special directories . and ...
After that, I use stat to find the youngest directory which to me means the one with the newest modification time. Note that Unix doesn't store creation time. However, according to perlport ctime is creation time on Win32, so you might prefer that instead of mtime.
If I didn't use File::stat, instead of this:
my $youngest_directory_stat = stat ORIG_SERVER . "/$directory/$newest_directory";
my $sub_directory_stat = stat ORIG_SERVER . "/$directory/$sub_directory";
if ( $newest_directory_stat->mtime > $sub_directory_stat->mtime ) {
$newest_directory = $sub_directory;
}
I could have done this:
my $newest = ORIG_SERVER . "/$directory/$newest_directory";
my $sub_dir = ORIG_SERVER . "/$directory/$sub_directory";
if ( stat( $newest )[9] > stat( $sub_dir )[9] ) {
$newest_directory = $sub_directory;
}
The stat command without File::stat returns an array of values, and I could have simply used the [9] element of that array. However, what is 9? Even though it could of saved me a few lines of code, and including an extra Perl module, it's better to use File::stat.
One thing you notice is that constants don't interpolate which means I have to keep doing things like this:
my $youngest_directory_stat = stat ORIG_SERVER . "/$directory/$newest_directory";
However, you can use this bit of Perlish black magic to interpolate constants inside quotes:
my $youngest_directory_stat = stat "#{[ORIG_SERVER]}/$directory/$newest_directory";
Hope that helps.

pattern search in all the files in a directory

I have the pattern something like "keyword : Multinode". Now, I need to search this pattern in all the files in a directory. If we found the pattern in any of the file, a non empty-string should be returned. It may contain file-name or directory name
In shell scripting the following will do the same
KeyMnode=grep -w "keyword : Multinode" ${dirname}/*
I thought of using find(subroutine,directory_path) and inside the sub-routine I want to traverse through the entire directory for all its entries. For every entry I want to put a check whether it is a readable file or not. If the file is readable, I want to search for the required pattern "keyword : Multinode" in the file found. If we hit with a success, the entire find command should result in a non-empty string(preferably only the existing directory Name) otherwise with an empty string. Please let me know if you need any further information.
I want this to be done using perl. Please help me with the solution.
Here are some Perl tools that will be useful in doing what you described:
File::Find will do a recursive search for files in a directory and its children, running code (the \&wanted callback in the docs) against each one to determine whether it meets your criteria or not
The -r operator will tell you whether a file is readable (if (-r $file_name)...)
open will get you access to the file and <$fh> will read its contents so that you can check with a regular expression whether they match your target pattern
Adding \b to the beginning and end of the pattern will cause it to match only at word boundaries, similar to grep's -w switch
If you have more specific issues, please post additional questions with code that demonstrates them, including statements both of what you expected to happen and of how the actual results differed from your expectation and we'll be happy to help resolve those issues.
Edit: Cleaned up and runnable version of code from comment:
#!/usr/bin/env perl
use strict;
use warnings;
use 5.010;
use File::Find;
# Get $dirname from first command-line argument
my $dirname = shift #ARGV;
find(\&do_process, $dirname); # quotes around $dirname weren't needed
my ($KeyMnode, $KeyThreads);
sub do_process {
# chomp($_); - not needed; $_ isn't read from a file, so no newline on it
if (-r $_) { # quotes around $_ weren't needed
# $_ is just the final part of the file name; it may be better for
# reporting the location of matches to set $file_name to
# $File::Find::name instead
my $file_name = $_;
open(my $fh, '<', $file_name); # Use three-arg open!
while (<$fh>) {
chomp();
# Note that, if you store all matches into the same scalar values,
# you'll end up with only the last value found for each pattern; you
# may want to push the matches onto arrays instead.
if (/\bkeyword : Multinode\b/i) { $KeyMnode = "$file_name:$_"; }
if (/\bkeyword : Threads\b/i) { $KeyThreads = "$file_name:$_"; }
}
}
}