Perl: Select Filepath for csv file inside folder - perl

I have a Perl Script which does some data manipulation with a selected CSV file. In the past, I have renamed the CSV file to match the one specified inside my script.
I now want to change it so that the sole file in a folder is selected, but the csv file is not always named the same. There will only ever be a single file in the folder.
I currently use this method;
my $filepath_in = 'C:\delete_csv_files\files_new\input.csv';
my $filepath_out = 'C:\delete_csv_files\files_processed\output.csv';
open my $in, '<:encoding(utf8)', $filepath_in or die;
open my $out, '>:encoding(utf8)', $filepath_out or die;
I also want the file to retain its original name after its been processed.
Can anyone give me any pointers?

As suggested by toolic and commented by ikegami, you can use glob.
my ($filepath_in) = glob 'C:\delete_csv_files\files_new\*';
Then you can use a regex to generate the name of the output file, like :
(my $filepath_out = $filepath_in) =~ s!\\files_new\\!\\files_processed\\!;
This will give you a file with the same name, in directory files_processed.
If you want to force the name of the ouput file to output.csv like in your code snippet, then use this regex instead :
(my $filepath_out = $filepath_in) =~ s!\\files_new\\.*$!\\files_processed\\output.csv!;

Related

Get the path for a similarly named file in perl, where only the extension differs?

I'm trying to write an Automator service, so I can chuck this into a right-click menu in the gui.
I have a filepath to a txt file, and there is a similarly named file that varies only in the file extension. This can be a pdf or a jpg, or potentially any other extension, no way to know beforehand. How can I get the filepath to this other file (there will only be one such)?
$other_name =~ s/txt$/!(txt)/;
$other_name =~ s/ /?/g;
my #test = glob "$other_name";
In Bash, I'd just turn on the extglob option, and change the "txt" at the end to "!(txt)" and the do glob expansion. But I'm not even sure if that's available in perl. And since the filepaths always have spaces (it's in one of the near-root directory names), that further complicates things. I've read through the glob() documentation at http://perldoc.perl.org/functions/glob.html and tried every variation of quoting (the above example code shows my attempt after having given up, where I just remove all the spaces entirely).
It seems like I'm able to put modules inside the script, so this doesn't have to be bare perl (just ran a test).
Is there an elegant or at least simple way to accomplish this?
You can extract everything in the filename up to extension, then run a glob with that and filter out the unneeded .txt. This is one of those cases where you need to protect the pattern in the glob with a double set of quotes, for spaces.
use warnings;
use strict;
use feature qw(say);
my $file = "dir with space/file with spaces.txt";
# Pull the full name without extension
my ($basefname) = $file =~ m/(.*)\.txt$/;
# Get all files with that name and filter out unneeded (txt)
my #other_exts = grep { not /\.txt$/ } glob(qq{"$basefname.*"});
say for #other_exts;
With a toy structure like this
dir space/
file with spaces.pdf
file with spaces.txt
The output is
dir space/file with spaces.pdf
This recent post has more on related globs.
Perl doesn't allow the not substring construct in glob. You have to find all files with the same name and any extension, and remove the one ending with .txt
This program shows the idea. It splits the original file name into a stem part and a suffix part, and uses the stem to form a glob pattern. The grep removes any result that ends with the original suffix
It picks only the first matching file name if there is more than one candidate. $other_name will be set to undef if no matching file was found
The original file name is expected as a parameter on the command line
The result is printed to STDOUT; I don't know what you need for your right-click menu
The line use File::Glob ':bsd_glob' is necessary if you are working with file paths that contain spaces, as it seems you are
use strict;
use warnings 'all';
use File::Glob ':bsd_glob';
my ($stem, $suffix) = shift =~ /(.*)(\..*)/;
my ($other_name) = grep ! /$suffix$/i, glob "$stem.*";
$other_name =~ tr/ /?/;
print $other_name, "\n";
This is an example, based on File::Basename core module
use File::Basename;
my $fullname = "/path/to/my/filename.txt";
my ($name, $path, $suffix) = fileparse($fullname, qw/.txt/);
my $new_filename = $path . $name . ".pdf";
# $name --> filename
# $path --> /path/to/my/
# $suffix --> .txt
# $new_filename --> /path/to/my/filename.pdf

How can I create a new output file for each subfolder under a main folder using perl?

I have 100 subfolder in a main folder. They have difference names. Each subfolder includes a .txt file, which has 10 column. I want to get a new .txt file for each subfolder. Each new .txt file must be in its own folder. That is I will have 2 .txt files (old and new) in each subfolder. I am trying to select the lines starting "ATOM" and some columns 2,6,7 and 8 from each .txt file. My code is the following. It doesn't work correctly. It doesnt create a new .txt file. How can i figure out this problem?
#!/usr/bin/perl
$search_text = "ATOM";
#files = <*/*.txt>;
foreach $file (#files) {
print $file . "\n";
open(DATA, $file);
open(OUT_FILE, ">$file a.txt");
while ($line = <DATA>)
{
#fields = split /\s+/, $line;
if ($line =~ m/$search_text/)
{
print OUT_FILE "$fields[2]\t$fields[6]\t$fields[7]\t$fields[8]\n";
}
}
}
close(OUT_FILE);
To put the output file a.txt into the same directory as the input file, you need to extract the directory name from the input file name, and prepend it to the output file name (a.txt). There are a couple of ways you can do that; probably the simplest is to use dirname() from the standard module File::Basename:
use File::Basename;
my $dir = dirname($file);
open(OUT_FILE, ">", "$dir/a.txt") or die "Failed to open $dir/a.txt: $!";
or you could use File::Spec directly:
use File::Spec;
my ($volume, $dir) = File::Spec->splitpath($file);
my $outname = File::Spec->catpath($volume, $dir, 'a.txt');
open(OUT_FILE, ">", $outname) or die "Failed to open $outname: $!";
or you could just use a regexp substitution:
my $outname = ( $file =~ s![^/]+$!a.txt!r );
open(OUT_FILE, ">", $outname) or die "Failed to open $outname: $!";
Ps. In any case, I'd recommend adopting several good habits that will help you write better Perl scripts:
Always start your scripts with use strict; and use warnings;. Fix any errors and warnings they produce. In particular, declare all your local variables with my to make them lexically scoped.
Check the return value of functions like open(), and abort the script if they fail. (I've done this in my examples above.)
Use the three-argument form of open(), as I also did in my examples above. It's a lot less likely to break if your filenames contain funny characters.
Consider using lexically scoped file handles (open my $out_file, ...) instead of global file handles (open OUT_FILE, ...). I didn't do that in my code snippets above, because I wanted to keep them compatible with the rest of your code, but it would be good practice.
If you're pre-declaring a regular expression, like your $search_text, use qr// instead of a plain string, like this:
my $search_text = qr/ATOM/;
It's slightly more efficient, and the quoting rules for special characters are much saner.
For printing multiple columns from an array, consider using join() and a list slice, as in:
print OUT_FILE join("\t", #fields[2,6,7,8]), "\n";
Finally, if I were you, I'd reconsider my file naming scheme: the output file name a.txt matches your input file name glob *.txt, so your script will likely break if you run it twice in a row.

Perl regex search of file name and extensions against a predefined array

I want to filter out some files from a directory. I am able to grab the files and their extensions recursively, but now what I want to do is to match the file extension and file name with a predefined array of extensions and file names using wildcard search as we use to do in sql.
my #ignore_exts = qw( .vmdk .iso .7z .bundle .wim .hd .vhd .evtx .manifest .lib .mst );
I want to filter out the files which will have extensions like the above one.
e.g. File name is abc.1.149_1041.mst and since the extension .mst is present in #ignore_ext, so I want this to filter out. The extension I am getting is '.1.149_1041.mst'. As in sql I'll do something like select * from <some-table> where extension like '%.mst'. Same thing I want to do in perl.
This is what I am using for grabbing the extension.
my $ext = (fileparse($filepath, '\..*?')) [2];
In order to pull a file extension off a filename this should work:
/^(.*)\.([^.]+)$/
$fileName = $1;
$extension = $2;
This might do the trick for you.
Input: a.b.c.text
$1 will be a.b.c.d
$2 will be text
Basically this will take everything from the start of the line until the last period and group that in the 1st group, and then everything from the last period to the end of the line as group 2
You can see a sample here: http://regex101.com/r/vX3dK1
As for checking whether the extension exists in the array read here: (How can I check if a Perl array contains a particular value?)
if (grep (/^$extension/, #array)) {
print "Extension Found\n";
}
Just turn your list of extensions into a regular expression, and then test against the $filepath.
my #ignore_exts = qw( .vmdk .iso .7z .bundle .wim .hd .vhd .evtx .manifest .lib .mst );
my $ignore_exts_re = '(' . join('|', map quotemeta, #ignore_exts) . ')$';
And then later to compare
if ($filepath =~ $ignore_exts_re) {
print "Ignore $filepath because it ends in $1\n";
next;

Perl: Substitute text string with value from list (text file or scalar context)

I am a perl novice, but have read the "Learning Perl" by Schwartz, foy and Phoenix and have a weak understanding of the language. I am still struggling, even after using the book and the web.
My goal is to be able to do the following:
Search a specific folder (current folder) and grab filenames with full path. Save filenames with complete path and current foldername.
Open a template file and insert the filenames with full path at a specific location (e.g. using substitution) as well as current foldername (in another location in the same text file, I have not gotten this far yet).
Save the new modified file to a new file in a specific location (current folder).
I have many files/folders that I want to process and plan to copy the perl program to each of these folders so the perl program can make new .
I have gotten so far ...:
use strict;
use warnings;
use Cwd;
use File::Spec;
use File::Basename;
my $current_dir = getcwd;
open SECONTROL_TEMPLATE, '<secontrol_template.txt' or die "Can't open SECONTROL_TEMPLATE: $!\n";
my #secontrol_template = <SECONTROL_TEMPLATE>;
close SECONTROL_TEMPLATE;
opendir(DIR, $current_dir) or die $!;
my #seq_files = grep {
/gz/
} readdir (DIR);
open FASTQFILENAMES, '> fastqfilenames.txt' or die "Can't open fastqfilenames.txt: $!\n";
my #fastqfiles;
foreach (#seq_files) {
$_ = File::Spec->catfile($current_dir, $_);
push(#fastqfiles,$_);
}
print FASTQFILENAMES #fastqfiles;
open (my ($fastqfilenames), "<", "fastqfilenames.txt") or die "Can't open fastqfilenames.txt: $!\n";
my #secontrol;
foreach (#secontrol_template) {
$_ =~ s/#/$fastqfilenames/eg;
push(#secontrol,$_);
}
open SECONTROL, '> secontrol.txt' or die "Can't open SECONTROL: $!\n";
print SECONTROL #secontrol;
close SECONTROL;
close FASTQFILENAMES;
My problem is that I cannot figure out how to use my list of files to replace the "#" in my template text file:
my #secontrol;
foreach (#secontrol_template) {
$_ =~ s/#/$fastqfilenames/eg;
push(#secontrol,$_);
}
The substitute function will not replace the "#" with the list of files listed in $fastqfilenames. I get the "#" replaced with GLOB(0x8ab1dc).
Am I doing this the wrong way? Should I not use substitute as this can not be done, and then rather insert the list of files ($fastqfilenames) in the template.txt file? Instead of the $fastqfilenames, can I substitute with content of file (e.g. s/A/{r file.txt ...). Any suggestions?
Cheers,
JamesT
EDIT:
This made it all better.
foreach (#secontrol_template) {
s/#/$fastqfilenames/g;
push #secontrol, $_;
}
And as both suggestions, the $fastqfiles is a filehandle.
replaced this: open (my ($fastqfilenames), "<", "fastqfilenames.txt") or die "Can't open fastqfilenames.txt: $!\n";
with this:
my $fastqfilenames = join "\n", #fastqfiles;
made it all good. Thanks both of you.
$fastqfilenames is a filehandle. You have to read the information out of the filehandle before you can use it.
However, you have other problems.
You are printing all of the filenames to a file, then reading them back out of the file. This is not only a questionable design (why read from the file again, since you already have what you need in an array?), it also won't even work:
Perl buffers file I/O for performance reasons. The lines you have written to the file may not actually be there yet, because Perl is waiting until it has a large chunk of data saved up, to write it all at once.
You can override this buffering behavior in a few different ways (closing the file handle being the simplest if you are done writing to it), but as I said, there is no reason to reopen the file again and read from it anyway.
Also note, the /e option in a regex replacement evaluates the replacement as Perl code. This is not necessary in your case, so you should remove it.
Solution: Instead of reopening the file and reading it, just use the #fastqfiles variable you previously created when replacing in the template. It is not clear exactly what you mean by replacing # with the filenames.
Do you want to to replace each # with a list of all filenames together? If so, you should probably need to join the filenames together in some way before doing the replacement.
Do you want to create a separate version of the template file for each filename? If so, you need an inner for loop that goes over each filename for each template. And you will need something other than a simple replacement, because the replacement will change the original string on the first time through. If you are on Perl 5.16, you could use the /r option to replace non-destructively: push(#secontrol,s/#/$file_name/gr); Otherwise, you should copy to another variable before doing the replacement.
$_ =~ s/#/$fastqfilenames/eg;
$fastqfilenames is a file handle, not the file contents.
In any case, I recommend the use of Text::Template module in order to do this kind of work (file text substitution).

how to copy a file(input from user) with the name edited to the same directory?

I would like to request a txt file from user and duplicated an exact copy with the name edited on the duplicated file in the same location.
Eg: User provide /file/works/done/abc.txt
The duplicated file will need to be /file/works/done/abc_edited.txt
I am able to duplicate the file.However, I cant append the name to the one I wish to have.
Assumption: $file is argument from user, eg: $file is /file/works/done/abc.txt
Code as below:
my $a = '_edited';
my $duplicatedfile = $file.$a;
copy($file,$duplicatedfile) or die "Failed to copy $file: $!\n
After execution, the duplicated file is /file/works/done/abc.txt_edited
However the one that I wish to have is /file/works/done/abc_edited.txt
Show us some code and a problem you're having with it, but please don't ask us to write the whole thing for you. You might want to look at the File::Copy module for an easy-to-use "copy file" method.
Oh well, after reading your comment it looks like all you need is something like
my $new_file_name = $file;
$new_file_name =~ s/\.([^\.]+)$/_edited.$1/;
use File::Basename;
my $full_path = '/file/works/done/abc.txt';
my ($name, $path, $ext) = fileparse($full_path, qr/\.[^.]*/);
my $new_full_path = $path.$name.'_edited'.$ext;
print $new_full_path;