Renaming and Moving Files in Bash or Perl - perl

HI, I'm completely new to Bash and StackOverflow.
I need to move a set of files (all contained in the same folder) to a target folder where files with the same name could already exist.
In case a specific file exists, I need to rename the file before moving it, by appending for example an incremental integer to the file name.
The extensions should be preserved (in other words, that appended incremental integer should go before the extension). The file names could contain dots in the middle.
Originally, I was thinking about comparing the two folders to have a list of the existing files (I did this with "comm"), but then I got a bit stuck. I think I'm just trying to do things in the most complicated possible way.
Any hint to do this in the "bash way"? It's OK if it is done in a script other than bash script.

If you don't mind renaming the files that already exist, GNU mv has the --backup option:
mv --backup=numbered * /some/other/dir

Here is a Bash script:
source="/some/dir"
dest="/another/dir"
find "$source" -maxdepth 1 -type f -printf "%f\n" | while read -r file
do
suffix=
if [[ -a "$dest/$file" ]]
then
suffix=".new"
fi
# to make active, comment out the next line and uncomment the line below it
echo 'mv' "\"$source/$file\"" "\"$dest/$file$suffix\""
# mv "source/$file" "$dest/$file$suffix"
done
The suffix is added blindly. If you have files named like "foo.new" in both directories then the result will be one file named "foo.new" and the second named "foo.new.new" which might look silly, but is correct in that it doesn't overwrite the file. However, if the destination already contains "foo.new.new" (and "foo.new" is in both source and destination), then "foo.new.new" will be overwritten).
You can change the if above to a loop in order to deal with that situation. This version also preserves extensions:
source="/some/dir"
dest="/another/dir"
find "$source" -maxdepth 1 -type f -printf "%f\n" | while read -r file
do
suffix=
count=
ext=
base="${file%.*}"
if [[ $file =~ \. ]]
then
ext=".${file##*.}"
fi
while [[ -a "$dest/$base$suffix$count$ext" ]]
do
(( count+=1 ))
suffix="."
done
# to make active, comment out the next line and uncomment the line below it
echo 'mv' "\"$source/$file\"" "\"$dest/$file$suffix$count$ext\""
# mv "$source/$file" "$dest/$file$suffix$count$ext"
done

As per OP, this can be Perl, not just bash. Here we go
NEW SOLUTION: (paying attention to extension)
~/junk/a1$ ls
f1.txt f2.txt f3.txt z1 z2
~/junk/a1$ ls ../a2
f1.txt f2.1.txt f2.2.txt f2.3.txt f2.txt z1
# I split the one-liner into multiple lines for readability
$ perl5.8 -e
'{use strict; use warnings; use File::Copy; use File::Basename;
my #files = glob("*"); # assume current directory
foreach my $file (#files) {
my $file_base2 = basename($file);
my ($file_base, $ext) = ($file_base2 =~ /(.+?)([.][^.]+$)?$/);
my $new_file_base = "../a2/$file_base";
my $new_file = $new_file_base . $ext;
my $counter = 1;
while (-e $new_file) {
$new_file = "$new_file_base." . $counter++ . $ext;
}
copy($file, $new_file)
|| die "could not copy $file to $new_file: $!\n";
} }'
~/junk/a1> ls ../a2
f1.1.txt f1.txt f2.1.txt f2.2.txt f2.3.txt f2.4.txt f2.txt f3.txt
z1 z1.1 z2
OLD SOLUTION: (not paying attention to extension)
~/junk/a1$ ls
f1 f2 f3
~/junk/a1$ ls ../a2
f1 f2 f2.1 f2.2 f2.3
# I split the one-liner into multiple lines for readability
$ perl5.8 -e
'{use strict; use warnings; use File::Copy; use File::Basename;
my #files = glob("*"); # assume current directory
foreach my $file (#files) {
my $file_base = basename($file);
my $new_file_base = "../a2/$file_base";
my $new_file = $new_file_base;
my $counter = 1;
while (-e $new_file) { $new_file = "$new_file_base." . $counter++; }
copy($file,$new_file)
|| die "could not copy $file to $new_file: $!\n";
} }'
~/junk/a1> ls ../a2
f1 f1.1 f2 f2.1 f2.2 f2.3 f2.4 f3

I feel bad for posting this without testing it. However it is late and I have work in the morning. My attempt would look something like this:
## copy files from src to dst
## inserting ~XX into any name between base and extension
## where a name collision would occur
src="$1"
dst="$2"
case "$dst" in
/*) :;; # absolute dest is fine
*) dst=$(pwd)/$dst;; # relative needs to be fixed up
esac
cd "$src"
find . -type f | while read x; do
x=${x#./} # trim off the ./
t=$x; # initial target
d=$(dirname $x); # relative directory
b=$(basename $x); # initial basename
ext=${b%%.*}; # extension
b=${b##*.}; # basename with ext. stripped off
let zz=0; # initial numeric
while [ -e "$dst/$t" ]; do
# target exists, so try constructing a new target name
t="$d/$bb~$zz.$ext"
let zz+=1;
done
echo mv "./$x" "$dst/$t"
done
Overall the strategy is to get each name from the source path, break it into parts, and, for any collision, iterate over names of the form "base~XX.extension" until we find one that doesn't collide.
Obviously I have prepended the mv command with an echo because I'm a coward. Remove that at your own (files') peril.

If you dont need incremental suffix, rsync can do the job:
rsync --archive --backup --suffix=.sic src/ dst
Update:
find/sed/sort is used to manage versioned backup files:
#!/bin/bash
src="${1}"
dst="${2}"
if test ! -d "${src}" -o ! -d "${dst}" ;then
echo Usage: $0 SRC_DIR DST_DIR >&2
exit 1
fi
rsync --archive --backup "${src}/" "${dst}/"
new_name() {
local dst=$1
local prefix=$2
local suffix=$3
local max=$(find ${dst} -type f -regex ".*${prefix}.[0-9]*.${suffix}\$" \
| sed 's/.*\.\([0-9]*\)\..*/\1/'|sort -n|tail -n 1)
let max++
echo ${prefix}.${max}.${suffix}
}
# swap BACKUP-extension/real-extension
for backup_file in $(find $dst -name "*~"); do
file=${backup_file%~}
prefix=${file%.*}
suffix=${file##*.}
suffix=${suffix%\~}
mv ${backup_file} $(new_name $dst $prefix $suffix)
done

Related

Perl directory is getting past if(! -d) statement? [readdir results]

Okay so I have a program that basically looks into a passed in directory, if any file names match a pattern I will make a directory and move that specific file and any that matches it (regardless of extension) into that directory. Now if they don't match I should move them into the PassedInDir/misc/ directory.
I have a condition in both cases to avoid passing in any directory (as my program isn't ready to deal with those yet) something like if( ! -d $fp).
Everything works fine when I run it the first time in the directory. However when I run it again on the same directory (which should now only contain directories) I get the Error Could not move file assignmentZ to destination DataB/misc at projectSorter.pl line 16.. AssignmentZ is a directory however its somehow getting past the (!-d) in the second case.
#!/usr/bin/perl -w
use File::Copy;
if(#ARGV < 1){
print "\nUsage: proj6.pl <directory>\n\n";
exit;
}
die("\nDirectory $ARGV[0] does not exist\n\n") if( ! -e $ARGV[0]);
opendir( DIR, $ARGV[0]) or die("\nCould not open directory $ARGV[0]\n\n");
while(($fp = readdir(DIR))){
if($fp =~ m/proj(.*)\./){
(! -d "$ARGV[0]/assignment$1") && (mkdir "$ARGV[0]/assignment$1");
move("$ARGV[0]/$fp" , "$ARGV[0]/assignment$1") or die("Could not move file $fp to destination $ARGV[0]/assignment$1");
}
elsif(! -d $fp){ #gets past here!!!
(! -d "$ARGV[0]/misc") && (mkdir "$ARGV[0]/misc");
move("$ARGV[0]/$fp" , "$ARGV[0]/misc") or die("Could not move file $fp to destination $ARGV[0]/misc");
}
}
It is the only directory to do it out of the ones previously made by running my program once. I am curious about why this is happening.
$fp as set by readdir is relative to scanned directory. chdir to the scanned directory or prepend the scanned directory name for -d test.
You use "$ARGV[0]/$fp" as argument to move function.
perldoc -f readdir
readdir DIRHANDLE
Returns the next directory entry for a directory opened by
"opendir". […]
If you're planning to filetest the return values out of a
"readdir", you'd better prepend the directory in question.
Otherwise, because we didn't "chdir" there, it would have been
testing the wrong file.
Some suggestions.
‣ Don't use the -w flag with Perl. Some modules turn warnings off to do their work but the -w flag is global. With it, they will report warnings that should be ignored.
‣ Always have these two lines at the top of every script.
use strict;
use warnings;
These will catch a lot of errors in your code. See perldoc strict and perldoc warnings for more details.
‣ Use glob() or Find::Find instead of opendir/readdir/closedir.
‣ Use make_path() from File::Path instead of mkdir.
‣ Use an if statement for conditional execution instead of &&.
‣ Place blank lines in your code to make reading it easier.
File::Find and File::path are standard modules that come installed with Perl. For a list of the standard modules, see perldoc perlmodlib.
#!/usr/bin/perl
# --------------------------------------
# pragmas
use strict;
use warnings;
# --------------------------------------
# modules
use File::Copy;
use File::Path qw( make_path );
# --------------------------------------
# main
# make sure there is something to work on
if(#ARGV < 1){
print "\nUsage: proj6.pl <directory>\n\n";
exit;
}
# arguments should be directories
for my $src_dir ( #ARGV ){
# validate the source directory
die("\n$src_dir does not exist\n\n") if( ! -e $src_dir);
die("\n$src_dir is not a directory\n\n") if( ! -d $src_dir);
# move proj* files
for my $proj ( glob( "$src_dir/proj*" )){
# get the proj number
( my $number ) = $proj =~ m/proj(.*)\./;
# get the destination directory
my $dst_dir = "$src_dir/assignment$number";
# create the directory where it goes
if( ! -d $dst_dir ){
make_path( $dst_dir ) or die "could not make path $dst_dir";
}
# move the file
move( $proj, $dst_dir ) or die( "could not move file $proj to destination $dst_dir" );
} # end of $proj files
# move other files
for my $file ( grep { ! -d } glob( "$src_dir/*" )){
# get the destination directory
my $dst_dir = "$src_dir/misc";
# create the directory where it goes
if( ! -d $dst_dir ){
make_path( $dst_dir ) or die "could not make path $dst_dir";
}
# move the file
move( $file, $dst_dir ) or die( "could not move file $file to destination $dst_dir" );
} # end other files
} # end of src_dir

copy without overwrite and preserve existing files with extended name

i need to copy files with specific extensions. but the problem is there are multiple files with same file name.i dont want to overwrite them and store as copy like file _1,file_2 etc. when i tried using unix command line the files are overwitten though i used cp -n.
is there any way i can do this task using command line or perl ?
the command i used was
find -L . -name "*.txt" -exec cp -n {} -t ~/destination
You can also use the cp --backup=numbered option.
In Perl (untested)
perl -MFile::Copy=cp -e '-e ($n = "~/destination/$_") or cp $_, $n for #ARGV' *.txt
The below perl script recursively finding the files and copying into the destination folders however if already exists it will rename the file as filename_1, filename_2
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
use File::Spec::Functions qw'catfile';
use File::Copy qw'move';
#use autodie qw'move';
use File::Basename;
my ($filename);# = 'DUMBFILE';
my $origin = '/home/itadmin/FoldersFind/OriginalFolder';
my $destination = '/home/itadmin/FoldersFind/Destination';
mkdir($destination, 0777);
my ($path);
find(\&wanted, $origin);
sub wanted
{
if(-e $origin)
{
if($File::Find::name=~m/\.(txt|html|xml)$/gs)
{
$filename = basename($File::Find::name);
}
}
$path = "$destination/$filename";
my $cnt;
while(-e $path)
{
$cnt++;
$path = catfile $destination, "$filename.$cnt";
}
move($filename, $path);
}
Input: (Fetching files may be duplicate)
/OriginalFolder/<folders1>/*file
/OriginalFolder/<folders2>/*file
Output: (Renaming)
/Destination/*file_<count> #/Destination/*file_1
/Destination/*file_<count> #/Destination/*file_2

Remove elements in one array from another array in perl

I want to get all elements in clearcase, store them in an array, and then remove the symbolic links from that array. Problem is I don't know how to remove all elements in one array that are contained in another array since I'm new to perl.
Bellow is my code so far.
foreach ${dir} (#{code_vob_list})
{
${dir} =~ s/\n//;
open(FIND_FILES, "$cleartool find ${dir} -type f -exec 'echo \$CLEARCASE_PN' |") or die "Can't stat cleartool or execute : $!\n"; #This command gets all files
#{files_found} = <FIND_FILES>;
open(SYMBOLIC_FIND_FILES, "$cleartool find ${dir} -type l -exec 'echo \$CLEARCASE_PN' |") or die "Can't stat cleartool or execute : $!\n"; #This command get all symbolic links
#{symbolic_files_found} = <SYMBOLIC_FIND_FILES>;
#Filter away all strings contained in #{symbolic_files_found} from #{files_found}
foreach my ${file} (#{files_found})
{
#Here I will perform my actions on #{files_found} that not contains any symbolic link paths from #{symbolic_files_found}
}
}
Thanks in advance
To filter an array, you can use grep:
my #nonlinks = grep { my $f = $_;
! grep $_ eq $f, #symbolic_files_found }
#files_found;
But it's usually cleaner to use a hash.
my %files;
#files{ #files_found } = (); # All files are the keys.
delete #files{ #symbolic_files_found }; # Remove the links.
my #nonlinks = keys %files;
I suggest that you install and use List::Compare. The code would look like this
As I wrote in my comment, I'm not sure if you prefer to write your identifiers like that, and I'm also unclear if you've avoided backticks `...` (same as qx{...}) in favour of a pipe open for a reason, but this is closer to how I'd write your code
If you prefer, get_unique has a synonym get_Lonly which you may find more expressive
use List::Compare;
for my $dir ( #code_vob_list ) {
chomp $dir;
my #files_found = qx{$cleartool find $dir -type f -exec 'echo \$CLEARCASE_PN'};
chomp #files_found;
my #symbolic_files_found = qx{$cleartool find $dir -type l -exec 'echo \$CLEARCASE_PN'};
chomp #symbolic_files_found;
my $lc = List::Compare->new('--unsorted', \#files_found, \#symbolic_files_found);
my #unique = $lc->get_unique;
}

Perl search for files and rename them while copying to different directory [duplicate]

This question already has answers here:
Using Perl to rename files in a directory
(4 answers)
Closed 7 years ago.
I'm trying following code to search for .txt files and rename them while copying it to different directory.
#!/bin/perl
use File::Basename;
#txtfiles = <*/*.txt>;
foreach my $file(#textfiles){
$dir = dirname($file);
$file = basename($file);
$file =~ s/(\d+)/$dir/; //renaming number with $dir
`mkdir -p summary` unless -d summary;
`cp $file summary`;
}
Above code gives error saying no such file to copy however print statement at each line shows correctly (renamed file names)
NOOOOO!
Don't use system commands -- especially since there are Perl commands that you can use.
`mkdir -p summary` unless -d summary;
`cp $file summary`;
Use the Perl commands!
use File::Copy; # Standard Perl Module. USE IT!
...
mkdir 'summary' unless -d 'summary'; # No need for `-p`
copy $file, 'summary';
Here's a revised script:
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
# use File::Basename; -- No need for this in this version
use File::Copy;
# What files do you want to find?
# Do you want to find the immediate files or the
# files in the subdirectories?
my #text_files = glob('*.txt'); # Immediate files only
mkdir 'summary' if not -d 'summary'; # Make this dir before the loop
for my $file ( #text_files ) {
next unless -f $file; # Make sure this is a file you want to copy!
#
# No need for 'dirname' and 'basename' with '*.txt' glob
#
if ( not copy $file, 'summary' ) { # Check the outcome of this command.
warn qq(Could not copy file "$file" to "summary".);
}
}
Let us know if you need to copy files in subdirectories rather than just in the immediate directory. Maybe you can use make_tree found in File::Path, or the Perl version of find from File::Find. These are standard Perl modules that all Perl installations have.
Addendum
I don't want current directory. The files are resides inside one directory i.e. foo/text_v0001.txt, foo/text_v0002.txt, foo_au/text_v0003.txt, foo_au/text_v0004.txt continues.... I want to replace the numbers with directory name e.g. foo/text_v0001.txt should renamed to text_foo.txt and foo/text_v0002.txt should renamed to text_foo.txt (since in same folder we can't have same name files, we can add part2 and the end of second file i.e. text_fooPart2.txt).
That last part is a doozy, and it's a new requirement too. I need to verify that a file doesn't already exist with the same name, and if it does, I need to make sure that I find the next available name.
If I find that a file already exist, I'll loop incrementing a duplicate file counter until I find a file name that doesn't already exist.
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
use File::Basename;
use File::Copy;
use File::Glob; # Improved Glob matching.
use constant {
DIRECTORY => 'summary',
};
# What files do you want to find?
# Do you want to find the immediate files or the
# files in the subdirectories?
#
# Let's do a more sophisticated pattern making sure we're catching
# the files we want.
#
my #text_files = glob('*/*.txt'); # Subdirectories only
mkdir DIRECTORY if not -d DIRECTORY; # Make this dir before the loop
for my $file ( #text_files ) {
my $dir_name = dirname $file;
my $file_name = basename $file;
say "DEBUG: On '$file'.";
#
# Let's make sure that the file name matches the expected
# pattern. If the substitution doesn't succeed, we assume
# this file shouldn't be copied, and skip it.
#
# I'm serching for a file that has the suffix '_vxxxx.txt' where
# 'xxxx' is some number. I remove the number and the letter `v`,
# and add in the directory name.
#
if ( not $file_name =~ s/_v(\d+)\.txt$/_$dir_name.txt/ ) {
warn qq("$dir_name/$file_name" has not been copied.");
next;
}
#
# If the name matches, make sure it's a file
#
if ( not -f $file ) {
warn qq("$file" is not a file and wasn't copied.");
next
}
#
# Now make sure file name is unique
#
if ( -f DIRECTORY . "/$file_name" ) { # File name already exists
say qq(DEBUG: Duplicate File '$file_name' detected!);
my $dup_file_counter = 2;
( my $file_no_suffix = $file_name ) =~ s/\.txt$//;
#
# Find a non-matching name
#
for (;;) {
my $new_file_name = $file_no_suffix . "_part_$dup_file_counter.txt";
say "DEBUG: New file name '$new_file_name'";
say qq(DEBUG: if ( not -e #{[DIRECTORY]} . "/$new_file_name" ) { );
if ( not -e DIRECTORY . "/$new_file_name" ) {
$file_name = $new_file_name;
last;
}
else {
$dup_file_counter += 1;
}
}
}
if ( not copy $file, DIRECTORY . "/$file_name" ) { # Check the outcome of this command.
warn qq(Could not copy file "$file" to directory ") . DIRECTORY . qq(".);
}
}
in the loop you are using #textfiles instead of #txtfiles. Use strict
#!/usr/local/bin/perl
use File::Basename;
use strict;
use warnings;
my #txtfiles = glob("*.txt");
foreach my $file(#txtfiles){
my $dir = dirname($file);
$file = basename($file);
$file =~ s/(\d+)/$dir/; # renaming number with $dir
`mkdir -p summary` unless -d "summary";
`cp $file summary`;
}

How to loop through a directory in perl

I have a directory dir1 containing several hundreds of files, which are to be iteratively processed by a speech program called HRest. The program is supposed to take each file one by one, process it and put it in a new directory (as dir2 for first iteration) to be used in next iteration. My problem is that i don't know if the way I've employed to loop through the files in dir1, and also the way I am running the script (trainhmms.pl dir1 1) is correct.
If the files in dir1 are L1, L2, L3, ..., L500, I want HRest to be executed as
HRest -T 1 -I timedlabels_train.mlf -t -i 20 -l dir1/L1 -M dir2 -S train.scp
for the first file, and as
HRest -T 1 -I timedlabels_train.mlf -t -i 20 -l dir1/L2 -M dir2 -S train.scp
for the next file, and so on for all files. Then in next call of the script, I want it to be changed to
HRest -T 1 -I timedlabels_train.mlf -t -i 20 -l dir2/L1 -M dir3 -S train.scp
for the first file, and so on..
Here is the script for the first iteration:
#!/usr/bin/perl
use File::Slurp;
# Usage: trainhmms.pl dir1 1
# dir1: Folder containing models after being initialised by HInit (L1,L2,..,L512)
$file = $ARGV[0];
$iter = $ARGV[1];
my #files = read_dir '/Users/negarolfati/Documents/Detection_rerun/AF_TIMIT/1_state//trainHMMs/dir1';
for my $file ( #files ) {
$iter2 = $iter+1;
$cmd = "HRest -T 1 -I timedlabels_train.mlf -t -i 20 -l '$dir[$iter]/$file' -M '$dir[$iter2]' -S train.scp ";
system("$cmd");
}
You can't just use readdir on a directory string. You have to opendir the string, then readdir from the directory handle that you get, and finally closedir the handle.
You must also remember that readdir returns directory names as well as file names, and the pseudo-directories . and .. too. To filter out just the files, you can use the -f test operator. And it is usually most convenient to chdir to the directory you are reading so that you don't have to append the path to each file name that readdir returns before you do the test.
I don't know what HRest is, but if your command line must be executed from a specific working directory (perhaps to acccess timedlabels_train.mlf and train.scp) then please say so. I will have to remove the chdir statement.
Something like this should get you going. I have used autodie, which does automatic checks on file system operations. It saves having to check chdir and opendir explicitly each time with or die $!.
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
use File::Spec::Functions 'catdir';
my ($file, $iter) = #ARGV;
my $root = '/Users/negarolfati/Documents/Detection_rerun/AF_TIMIT/1_state/trainHMMs';
my $dir1 = catdir $root, 'dir'.$iter;
my $dir2 = catdir $root, 'dir'.($iter+1);
chdir $dir1;
opendir my ($dh), '.';
my #files = grep -f, readdir $dh;
closedir $dh;
for my $file ( #files ) {
my $cmd = "HRest -T 1 -I timedlabels_train.mlf -t -i 20 -l '$dir1/$file' -M '$dir2' -S train.scp";
system($cmd);
}
Update
Here is an alternative version that avoids chdir so that the current working directory remains unchanged.
I have added the secondary loop that was in your bash script. I have also added a print statement so that you can see each command before it is executed.
To allow the system call to go ahead, just delete or comment out the next statement.
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
use File::Spec::Functions qw/ catdir catfile /;
STDOUT->autoflush;
my $root = '/Users/negarolfati/Documents/Detection_rerun/AF_TIMIT/1_state/trainHMMs';
for my $iter (1 .. 4) {
my $dir1 = catdir $root, 'dir'.$iter;
my $dir2 = catdir $root, 'dir'.($iter+1);
opendir my ($dh), $dir1;
while (my $node = readdir $dh) {
my $file = catfile($dir1, $node);
next unless -f $file;
my $cmd = "HRest -T 1 -I timedlabels_train.mlf -t -i 20 -l '$file' -M '$dir2' -S train.scp";
print $cmd, "\n";
next; # Remove for full functionality
system($cmd);
}
closedir $dh;
}
You can do this:
my #files = <$path/*>;
foreach my $filename ( reverse(#files) ) {
...
}