How to link zip folders to newly created folder - perl

I am trying to copy data from a folder (named Zip) to a set of newly created folders.
Zip folder content is:
SO_90_X_L001_R1.fastq.gz
SO_100_X_L001_R1.fastq.gz
SO_101_X_L001_R1.fastq.gz
and I have created the following empty folders:
SO_90
SO_100
SO_101
Without giving keyboard input, is it possible to copy those zipped files to matching folders using Perl?
i tryed below script, than also i am not getting proper output.
#!usr/bin/perl
use File::Copy "cp";
open(my $F, "a.txt") or die("cant open a.txt\n");
while(<$F>)
{
next unless /\S/;
mkdir $_ ;
}
close($F);
for my $file (<SO_/*.fastq.gz>){
print $_;
if( $file =~ m!SO_/(.*)_X_L001_R1.fastq.gz! ) {
mkdir($_); # comment this line if not necessary
cp($file, "$1/") or warn("Copy '$file, $1' failed\n");
} else {
warn("$file is not ending in '_X_L001_R1.fasta.gz'\n");
}
}

I am writing a new answer because we the question is in fact different.
We have several files like ZIP/SO_100_X_L001_R1.fastq.gz to copy to
SO_100/....
#!/usr/bin/perl
use File::Copy "cp";
for my $file (<ZIP/*.fastq.gz>){
print $_;
if( $file =~ m!ZIP/(.*)_X_L001_R1.fastq.gz! ) {
mkdir($1); # comment this line if not necessary
cp($file, "$1/") or warn("Copy '$file, $1' failed\n");
} else {
warn("$file is not ending in '_X_L001_R1.fasta.gz'\n");
}
}
Edit: I added some "warnings" in order to help with the debug

perl -nle 'mkdir $_ if /\S/' a.txt
-l to remove \n from folder names (otherwise folder name would be SO_100\n)
if /\S/ to skip possible empty-lines in the input file
(your question changed....) Update: If you need more complex processing, you may build a script where you can include someting like:
open(my $F, "a.txt") or die("cant open a.txt\n");
while(<$F>){
chomp;
next unless /\S/;
mkdir $_ ;
#... do other things related with this folder...
}
close $F;

Related

Perl - A way to get only the first (.txt) filename from another directory without loading them all?

I have a directory that holds ~5000 2,400 sized .txt files.
I just want one filename from that directory; order does not matter.
The file will be processed and deleted.
This is not the scripts working directory.
The intention is:
to open that file,
read it,
do some stuff,
unlink it and then
loop to the next file.
My crude attempt does not check for only .txt files and also has to get all ~5000 filenames just for one filename. I am also possibly calling too many modules?
The Verify_Empty sub was intended to validate that there is a directory and there are files in it but, my attempts are failing so, here I am seeking assistance.
#!/usr/bin/perl -w
use strict;
use warnings;
use CGI;
use CGI ':standard';
print CGI::header();
use CGI::Carp qw(fatalsToBrowser warningsToBrowser);
###
use vars qw(#Files $TheFile $PathToFile);
my $ListFolder = CGI::param('openthisfolder');
Get_File($ListFolder);
###
sub Get_File{
$ListFolder = shift;
unless (Verify_Empty($ListFolder)) {
opendir(DIR,$ListFolder);
#Files = grep { $_ ne '.' && $_ ne '..' } readdir(DIR);
closedir(DIR);
foreach(#Files){
$TheFile = $_;
}
#### This is where I go off to process and unlink file (sub not here) ####
$PathToFile = $ListFolder.'/'.$TheFile;
OpenFileReadPrepare($PathToFile);
#### After unlinked, the OpenFileReadPrepare sub loops back to this script.
}
else {
print qq~No more files to process~;
exit;
}
exit;
}
####
sub Verify_Empty {
$ListFolder = shift;
opendir(DIR, $ListFolder) or die "Not a directory";
return scalar(grep { $_ ne "." && $_ ne ".." } readdir(DIR)) == 0;
closedir(DIR);
}
Obviously I am very new at this. This method seems quite "hungry"?
Seems like a lot to grab one filename and process it!
Guidance would be great!
EDIT -Latest Attempt
my $dir = '..';
my #files = glob "$dir/*.txt";
for (0..$#files){
$files[$_] =~ s/\.txt$//;
}
my $PathAndFile =$files[0].'.txt';
print qq~$PathAndFile~;
This "works" but, it still gets all the filenames. None of the examples here, so far, have worked for me. I guess I will live with this for today until I figure it out. Perhaps I will revisit and see if anyone came up with anything better.
You could loop using readdir inside while loop. In that way readdir won't return all files but give only one at the time,
# opendir(DIR, ...);
my $first_file = "";
while (my $file = readdir(DIR)) {
next if $file eq "." or $file eq "..";
$first_file = $file;
last;
}
print "$first_file\n"; # first file in directory
You're calling readdir in list context, which returns all of the directory entries. Call it in scalar context instead:
my $file;
while( my $entry = readdir DIR ) {
$file = $entry, last if $entry =~ /\.txt$/;
}
if ( defined $file ) {
print "found $file\n";
# process....
}
Additionally, you read the directory twice; once to see if it has any entries, then to process it. You don't really need to see if the directory is empty; you get that for free during the processing loop.
Unless I am greatly mistaken, what you want is just to iterate over the files in a directory, and all this about "first or last" and "order does not matter" and deleting files is just confusion about how to do this.
So, let me put it in a very simple way for you, and see if that actually does what you want:
my $directory = "somedir";
for my $file (<$directory/*.txt>) {
# do stuff with the files
}
The glob will do the same as a *nix shell would, it would list the files with the .txt extension. If you want to do further tests on the files inside the loop, that is perfectly fine.
The downside is keeping 5000 file names in memory, and also that if processing this file list takes time, there is a possibility that it conflicts with other processes that also access these files.
An alternative is to simply read the files with readdir in a while loop, such as mpapec mentioned in his answer. The benefit is that each time you read a new file name, the file will be there. Also, you won't have to keep a large list of file in memory.

Determining why -f tag is saying only .pl files are files

The program I'm writing is suppose to open two directories and read through the files in them to compare the contents of those files. Then the functions that have changed in the files should be printed to a file. This program will mainly be checking .cpp files and .h files.
Currently I am trying to go through the directory and open the current file I am at to print the functions that have changed. However, I keep getting an error that states that the file is not a file and can't be opened.
Here is part of my current code that I am using
use strict;
use warnings;
use diagnostics -verbose;
use File::Compare;
use Text::Diff;
my $newDir = 'C:\Users\kkahla\Documents\Perl\TestFiles2';
my $oldDir = 'C:\Users\kkahla\Documents\Perl\TestFiles';
chomp $newDir;
$newDir =~ s#/$##;
chomp $oldDir;
$oldDir =~ s#/$##;
# Checks to make sure they are directories
unless(-d $newDir or -d $oldDir) {
print STDERR "Invalid directory path for one of the directories";
exit(0);
}
# Makes a directory for the outputs to go to unless one already exists
mkdir "Outputs", 0777 unless -d "Outputs";
# opens output file
open (OUTPUTFILE, ">Outputs\\diffDirectoriesOutput.txt");
print OUTPUTFILE "Output statistics for comparing two directories\n\n";
# opens both directories
opendir newDir, $newDir;
my #allNewFiles = grep { $_ ne '.' and $_ ne '..'} readdir newDir;
closedir newDir;
opendir oldDir, $oldDir;
my #allOldFiles = grep { $_ ne '.' and $_ ne '..'} readdir oldDir;
closedir oldDir
Here is where I want to open the files to read through them:
elsif((File::Compare::compare("$newDir/$_", "$oldDir/$_") == 1)) {
print OUTPUTFILE "File: $_ has been update. Please check marked functions for differences\n\n";
diff "$newDir/$_", "$oldDir/$_", { STYLE => "Table" , OUTPUT => \*OUTPUTFILE};
#Here is where I want to open the file but when I try it throws an error
#Here are the two opens I have tried:
open (FILE, "<$newDir/$_") or die "Can't open file"; #first attempt
open (FILE, "<$_") or die "Can't open file"; #second attempt to see if it worked
}
I tried adding the flags
my #allNewFiles = grep { $_ ne '.' and $_ ne '..' && -e $_} readdir newDir;
my #allNewFiles = grep { $_ ne '.' and $_ ne '..' && -f $_} readdir newDir;
But that would simply remove all files that weren't .pl file extensions. I tested that on some simple directories I have that have two copies of .txt, .cpp, .h, .c, .py, and .pl file extensions and it would only show that the .pl files were files.
I am new to perl and any help would be appreciated.
-f is returning undef with $! set to "No such file or directory" because you are passing a file name to -f instead of a path to the file.
Change
-f $_
to
-f "$newDir/$_"

Using Perl to rename files in a directory

I'd like to take a directory and for all email (*.msg) files, remove the 'RE ' at the beginning. I have the following code but the rename fails.
opendir(DIR, 'emails') or die "Cannot open directory";
#files = readdir(DIR);
closedir(DIR);
for (#files){
next if $_ !~ m/^RE .+msg$/;
$old = $_;
s/RE //;
rename($old, $_) or print "Error renaming: $old\n";
}
If your ./emails directory contains these files:
1.msg
2.msg
3.msg
then your #files will look something like ('.', '..', '1.msg', '2.msg', '3.msg') but your rename wants names like 'emails/1.msg', 'emails/2.msg', etc. So you can chdir before renaming:
chdir('emails');
for (#files) {
#...
}
You'd probably want to check the chdir return value too.
Or add the directory names yourself:
rename('emails/' . $old, 'emails/' . $_) or print "Error renaming $old: $!\n";
# or rename("emails/$old", "emails/$_") if you like string interpolation
# or you could use map if you like map
You might want to combine your directory reading and filtering using grep:
my #files = grep { /^RE .+msg$/ } readdir(DIR);
or even this:
opendir(DIR, 'emails') or die "Cannot open directory";
for (grep { /^RE .+msg$/ } readdir(DIR)) {
(my $new = $_) =~ s/^RE //;
rename("emails/$_", "emails/$new") or print "Error renaming $_ to $new: $!\n";
}
closedir(DIR);
You seem to be assuming glob-like behavior rather than than readdir-like behavior.
The underlying readdir system call returns just the filenames within the directory, and will include two entries . and ... This carries through to the readdir function in Perl, just to give a bit more detail on mu's answer.
Alternately, there's not much point to using readdir if you're collecting all the results in an array anyways.
#files = glob('emails/*');
As already mentioned, your script fails because of the path you expect and the script uses are not the same.
I would suggest a more transparent usage. Hardcoding a directory is not a good idea, IMO. As I learned one day when I made a script to alter some original files, with the hardcoded path, and a colleague of mine thought this would be a nice script to borrow to alter his copies. Ooops!
Usage:
perl script.pl "^RE " *.msg
i.e. regex, then a file glob list, where the path is denoted in relation to the script, e.g. *.msg, emails/*.msg or even /home/pat/emails/*.msg /home/foo/*.msg. (multiple globs possible)
Using the absolute paths will leave the user with no doubt as to which files he'll be affecting, and it will also make the script reusable.
Code:
use strict;
use warnings;
use v5.10;
use File::Copy qw(move);
my $rx = shift; # e.g. "^RE "
if ($ENV{OS} =~ /^Windows/) { # Patch for Windows' lack of shell globbing
#ARGV = map glob, #ARGV;
}
for (#ARGV) {
if (/$rx/) {
my $new = s/$rx//r; # Using non-destructive substitution
say "Moving $_ to $new ...";
move($_, $new) or die $!;
}
}
I don't know if the regex fits the specifig name of the files, but in one line this could be done with:
perl -E'for (</path/to/emails*.*>){ ($new = $_) =~ s/(^RE)(.*$)/$2/; say $_." -> ".$new}
(say ... is nice for testing, just replace it with rename $_,$new or rename($_,$new) )
<*.*> read every file in the current directory
($new = $_) =~ saves the following substitution in $new and leaves $_ as intact
(^RE) save this match in $1 (optional) and just match files with "RE" at the beginning
(.*$) save everything until and including the end ($) of the line -> into $2
substitute the match with the string in$2

Perl program help on opendir and readdir

So I have a program that I want to clean some text files. The program asks for the user to enter the full pathway of a directory containing these text files. From there I want to read the files in the directory, print them to a new file (that is specified by the user), and then clean them in the way I need. I have already written the script to clean the text files.
I ask the user for the directory to use:
chomp ($user_supplied_directory = <STDIN>);
opendir (DIR, $user_supplied_directory);
Then I need to read the directory.
my #dir = readdir DIR;
foreach (#dir) {
Now I am lost.
Any help please?
I'm not certain of what do you want. So, I made some assumptions:
When you say clean the text file, you meant delete the text file
The names of the files you want to write into are formed by a pattern.
So, if I'm right, try something like this:
chomp ($user_supplied_directory = <STDIN>);
opendir (DIR, $user_supplied_directory);
my #dir = readdir DIR;
foreach (#dir) {
next if (($_ eq '.') || ($_ eq '..'));
# Reads the content of the original file
open FILE, $_;
$contents = <FILE>;
close FILE;
# Here you supply the new filename
$new_filename = $_ . ".new";
# Writes the content to the new file
open FILE, '>'.$new_filename;
print FILE $content;
close FILE;
# Deletes the old file
unlink $_;
}
I would suggest that you switch to File::Find. It can be a bit of a challenge in the beginning but it is powerful and cross-platform.
But, to answer your question, try something like:
my #files = readdir DIR;
foreach $file (#files) {
foo($user_supplied_directory/$file);
}
where "foo" is whatever you need to do to the files. A few notes might help:
using "#dir" as the array of files was a bit misleading
the folder name needs to be prepended to the file name to get the right file
it might be convenient to use grep to throw out unwanted files and subfolders, especially ".."
I wrote something today that used readdir. Maybe you can learn something from it. This is just a part of a (somewhat) larger program:
our #Perls = ();
{
my $perl_rx = qr { ^ perl [\d.] + $ }x;
for my $dir (split(/:/, $ENV{PATH})) {
### scanning: $dir
my $relative = ($dir =~ m{^/});
my $dirpath = $relative ? $dir : "$cwd/$dir";
unless (chdir($dirpath)) {
warn "can't cd to $dirpath: $!\n";
next;
}
opendir(my $dot, ".") || next;
while ($_ = readdir($dot)) {
next unless /$perl_rx/o;
### considering: $_
next unless -f;
next unless -x _;
### saving: $_
push #Perls, "$dir/$_";
}
}
}
{
my $two_dots = qr{ [.] .* [.] }x;
if (grep /$two_dots/, #Perls) {
#Perls = grep /$two_dots/, #Perls;
}
}
{
my (%seen, $dev, $ino);
#Perls = grep {
($dev, $ino) = stat $_;
! $seen{$dev, $ino}++;
} #Perls;
}
The crux is push(#Perls, "$dir/$_"): filenames read by readdir are basenames only; they are not full pathnames.
You can do the following, which allows the user to supply their own directory or, if no directory is specified by the user, it defaults to a designated location.
The example shows the use of opendir, readdir, stores all files in the directory in the #files array, and only files that end with '.txt' in the #keys array. The while loop ensures that the full path to the files are stored in the arrays.
This assumes that your "text files" end with the ".txt" suffix. I hope that helps, as I'm not quite sure what's meant by "cleaning the files".
use feature ':5.24';
use File::Copy;
my $dir = shift || "/some/default/directory";
opendir(my $dh, $dir) || die "Can't open $dir: $!";
while ( readdir $dh ) {
push( #files, "$dir/$_");
}
# store ".txt" files in new array
foreach $file ( #files ) {
push( #keys, $file ) if $file =~ /(\S+\.txt\z)/g;
}
# Move files to new location, even if it's across different devices
for ( #keys ) {
move $_, "/some/other/directory/"; || die "Couldn't move files: $!\n";
}
See the perldoc of File::Copy for more info.

Why does this program fail to copy files?

this morning, my friend and I discussed and wrote the below code. The idea behind this Perl script is to create the directory structure and copy the files to the corresponding directory.
#!/usr/bin/perl
use File::Path;
use File::Copy;
use Path::Class;
use File::Basename qw/dirname/;
my $src = "/Vijay/new.txt";
unless (open(MYFILE, "file1")) {
die ("cannot open input file file1\n");
}
$line = <MYFILE>;
while ($line ne "") {
print ($line);
mkdir_and_copy($src,$line);
$line = <MYFILE>;
}
sub mkdir_and_copy {
my ($from, $to) = #_;
my($directory, $filename) = $to =~ m/(.*\/)(.*)$/;
print("creating dir $directory");
system "mkdir -p $directory";
print("copying file $from to $to");
system "cp -f $from $to";
return;
}
The above piece of code creates the directory structure, but fails to copy the files to the corresponding directory. Could you please let us know, where exactly we are wrong?
Contents of file1:
test/test1/test2/test.txt
Contents of new.txt:
Shell/Test/test1/test1.txt
Shell/Test/test2/test2.txt
Shell/Test/test3/test3.txt
Output:
> ./mypgm.pl
test/test1/test2/test.txt
creating dir test/test1/test2/copying file /Vijay/new.txt to test/test1/test2/test.txt
cp: cannot access /Vijay/new.txt: No such file or directory
>
The directory Vijay has the file new.txt with the above mentioned content.
Thanks in advance,
Vijay
Hello everyone,
I just modified my code. Please refer the below section of code.
#!/usr/bin/perl
use File::Path;
use File::Copy;
use File::Basename qw/dirname/;
my $src = "./Vijay/new.txt";
unless (open(MYFILE, "file1"))
{
die ("cannot open input file file1\n");
}
$line = ;
while ($line ne "")
{
print ($line); print("\n");
mkdir_and_copy($src,$line);
$line = ""; }
sub mkdir_and_copy
{
my ($from, $to) = #_;
my($directory, $filename) = $to =~ m/(.\/)(.)$/;
$temp = $directory.$filename;
print("Creating dirrectory $directory \n");
if(! -d $directory)
{
mkpath($directory) #or die "Failed to create path";
}
printf("From: $from \n");
printf("To: $temp \n");
copy($from,$temp) or die "Failed to Copy";
return;
}
Now, it creates the exact directory structure and copies the file to the corresponding directory. Could you please tell me that, whether the above code is a proper one?
Your goal is not clear to me, but perhaps this will help you solve the problem:
# Perl scripts should always include this.
# Your original script was generating some useful warnings.
use strict;
use warnings;
my $src = "/Vijay/new.txt";
my $f1 = 'file1';
# This is the recommended way to open a file --
# that is, using a lexical file handle.
open(my $file_handle, '<', $f1) or die "open() failed : $f1 : $!";
# This is the typical way of iterating over the lines in a file.
while (my $line = <$file_handle>){
# You probably want to remove the newline
# before passing the line to mkdir_and_copy()
chomp $line;
mkdir_and_copy($src, $line);
}
sub mkdir_and_copy {
my ($from, $to) = #_;
my ($directory, $filename) = $to =~ m/(.*\/)(.*)$/;
# When writing a script that makes system() calls,
# start by simply printing them. After everything
# looks good, convert the print commands to system() calls.
print "system(): mkdir -p $directory", "\n";
print "system(): cp -f $from $to", "\n";
# The return is not needed.
}
When I run the script with the inputs you provided, here's the output:
system(): mkdir -p test/test1/test2/
system(): cp -f /Vijay/new.txt test/test1/test2/test.txt
This can't be your intent. In particular, why are you iterating over file1 when it contains only one line? Perhaps you meant to iterate over new.txt?
The first thing to do if something "does't work" is to catch errors and to look at them. Then to investigate content of variables. In your case the variable $to just contains the file name, so the script copies it into the current working directory, I'd imagine, not into the newly created directory.
HOWEVER, the methods you're using to get your job done are not exactly the best. It would be better to actually use File::Path and File::Copy, and in particular your way of splitting a path into directory and filename at the first slash is anything but general. This sort of thing should be done in libraries, of which Perl has many.
I'll bet your $line variable still has a newline appended to it. The input returned from the filehandle input operator (<MYFILE>) includes the record separator (usually the newline character(s) for your OS). Try this:
$line = <MYFILE>;
chomp($line);