Data driven perl script - perl

I want to list file n folder in directory. Here are the list of the file in this directory.
Output1.sv
Output2.sv
Folder1
Folder2
file_a
file_b
file_c.sv
But some of them, i don't want it to be listed. The list of not included file, I list in input.txt like below. Note:some of them is file and some of them is folder
NOT_INCLUDED=file_a
NOT_INCLUDED=file_b
NOT_INCLUDED=file_c.sv
Here is the code.
#!/usr/intel/perl
use strict;
use warnings;
my $input_file = "INPUT.txt";
open ( OUTPUT, ">OUTPUT.txt" );
file_in_directory();
close OUTPUT;
sub file_in_directory {
my $path = "experiment/";
my #unsort_output;
my #not_included;
open ( INFILE, "<", $input_file);
while (<INFILE>){
if ( $_ =~ /NOT_INCLUDED/){
my #file = $_;
foreach my $file (#file) {
$file =~ s/NOT_INCLUDED=//;
push #not_included, $file;
}
}
}
close INFILE;
opendir ( DIR, $path ) || die "Error in opening dir $path\n";
while ( my $filelist = readdir (DIR) ) {
chomp $filelist;
next if ( $filelist =~ m/\.list$/ );
next if ( $filelist =~ m/\.swp$/ );
next if ( $filelist =~ s/\.//g);
foreach $_ (#not_included){
chomp $_;
my $not_included = "$_";
if ( $filelist eq $not_included ){
next;
}
push #unsort_output, $filelist;
}
closedir(DIR);
my #output = sort #unsort_output;
print OUTPUT #output;
}
The output that I want is to list all the file in that directory except the file list in input.txt 'NOT_INCLUDED'.
Output1.sv
Output2.sv
Folder1
Folder2
But the output that i get seem still included that unwanted file.

This part of the code makes no sense:
while ( my $filelist = readdir (DIR) ) {
...
foreach $_ (#not_included){
chomp $_;
my $not_included = "$_";
if ( $filelist eq $not_included ){
next;
} # (1)
push #unsort_output, $filelist; # (2)
}
This code contains three opening braces ({) but only two closing braces (}). If you try to run your code as-is, it fails with a syntax error.
The push line (marked (2)) is part of the foreach loop, but indented as if it were outside. Either it should be indented more (to line up with (1)), or you need to add a } before it. Neither alternative makes much sense:
If push is outside of the foreach loop, then the next statement (and the whole foreach loop) has no effect. It could just be deleted.
If push is inside the foreach loop, then every directory entry ($filelist) will be pushed multiple times, one for each line in #not_included (except for the names listed somewhere in #not_included; those will be pushed one time less).
There are several other problems. For example:
$filelist =~ s/\.//g removes all dots from the file name, transforming e.g. file_c.sv into file_csv. That means it will never match NOT_INCLUDED=file_c.sv in your input file.
Worse, the next if s/// part means the loop skips all files whose names contain dots, such as Output1.sv or Output2.sv.
Results are printed without separators, so you'll get something like
Folder1Folder1Folder1Folder2Folder2Folder2file_afile_afile_bfile_b in OUTPUT.txt.
Global variables are used for no reason, e.g. INFILE and DIR.
Here is how I would structure the code:
#!/usr/intel/perl
use strict;
use warnings;
my $input_file = 'INPUT.txt';
my %is_blacklisted;
{
open my $fh, '<', $input_file or die "$0: $input_file: $!\n";
while (my $line = readline $fh) {
chomp $line;
if ($line =~ s!\ANOT_INCLUDED=!!) {
$is_blacklisted{$line} = 1;
}
}
}
my $path = 'experiment';
my #results;
{
opendir my $dh, $path or die "$0: $path: $!\n";
while (my $entry = readdir $dh) {
next
if $entry eq '.' || $entry eq '..'
|| $entry =~ /\.list\z/
|| $entry =~ /\.swp\z/
|| $is_blacklisted{$entry};
push #results, $entry;
}
}
#results = sort #results;
my $output_file = 'OUTPUT.txt';
{
open my $fh, '>', $output_file or die "$0: $output_file: $!\n";
for my $result (#results) {
print $fh "$result\n";
}
}
The contents of INPUT.txt (more specifically, the parts after NOT_INCLUDED=) are read into a hash (%is_blacklisted). This allows easy lookup of entries.
Then we process the directory entries. We skip over . and .. (I assume you don't want those) as well as all files ending with *.list or *.swp (that was in your original code). We also skip any file that is blacklisted, i.e. that was specified as excluded in INPUT.txt. The remaining entries are collected in #results.
We sort our results and write them to OUTPUT.txt, one entry per line.

Not deviating too much from your code, here is the solution. Please find the comments:
#!/usr/intel/perl
use strict;
use warnings;
my $input_file = "INPUT.txt";
open ( OUTPUT, ">OUTPUT.txt" );
file_in_directory();
close OUTPUT;
sub file_in_directory {
my $path = "experiment/";
my #unsort_output;
my %not_included; # creating hash map insted of array for cleaner and faster implementaion.
open ( INFILE, "<", $input_file);
while (my $file = <INFILE>) {
if ($file =~ /NOT_INCLUDED/) {
$file =~ s/NOT_INCLUDED=//;
$not_included{$file}++; # create a quick hash map of (filename => 1, filename2 => 1)
}
}
close INFILE;
opendir ( DIR, $path ) || die "Error in opening dir $path\n";
while ( my $filelist = readdir (DIR) ) {
next if $filelist =~ /^\.\.?$/xms; # discard . and .. files
chomp $filelist;
next if ( $filelist =~ m/\.list$/ );
next if ( $filelist =~ m/\.swp$/ );
next if ( $filelist =~ s/\.//g);
if (defined $not_included{$filelist}) {
next;
}
else {
push #unsort_output, $filelist;
}
}
closedir(DIR); # earlier the closedir was inside of while loop. Which is wrong.
my #output = sort #unsort_output;
print OUTPUT join "\n", #output;
}

Related

List file in directory and store in array. That array can be access outside the loop

I want to list the files in the folder and want to store in array. How can I make the array can be access outside the loop? I need that array to be outside as need to use it outside the array
This is the code:
use strict;
use warnings;
my $dirname = "../../../experiment/";
my $filelist;
my #file;
open ( OUTFILE, ">file1.txt" );
opendir ( DIR, $dirname ) || die "Error in opening dir $dirname\n";
while ( $filelist = readdir (DIR) ) {
next if ( $filelist =~ m/\.svh$/ );
next if ( $filelist =~ m/\.sv$/ );
next if ( $filelist =~ m/\.list$/ );
push #fileInDir, $_;
}
closedir(DIR);
print OUTFILE " ", #fileInDir;
close OUTFILE;
The error message is
Use of uninitialized value in print at file.pl line 49.
You're pushing the unitialized variable $_ onto the array rather than $filelist, which is misleadingly named (it's just one file name).
You can use:
use strict;
use warnings;
my $dirname = "../../../experiment/";
my $fname = "file1.txt";
my #files;
open my $out_fh, ">", $fname or die "Error in opening file $fname: $!";
opendir my $dir, $dirname or die "Error in opening dir $dirname: $!";
while (my $file = readdir($dir)) {
next if ($file =~ m/\.svh$/);
next if ($file =~ m/\.sv$/);
next if ($file =~ m/\.list$/);
push #files, $file;
}
print $out_fh join "\n", #files;
closedir $dir;
close $out_fh;

Can't find file trying to move

I'm trying to clean up a directory that contains a lot of sub directories that actually belong in some of the sub directories, not the main directory.
For example, there is
Main directory
sub1
sub2
sub3
HHH
And HHH belongs in sub3. HHH has multiple text files inside of it (as well as some ..txt and ...txt files that I would like to ignore), and each of these text files has a string
some_pattern [sub3].
So, I attempted to write a script that looks into the file and then moves it into its corresponding directory
use File::Find;
use strict;
use warnings;
use File::Copy;
my $DATA = "D:/DATA/DATA_x/*";
my #dirs = grep { -d } glob $DATA;
foreach (#dirs) {
if ($_ =~ m/HHH/) {
print "$_\n";
my $file = "$_/*";
my #files = grep { -f } glob $file;
foreach (#files) {
print "file $_\n";
}
foreach (#files) {
print "\t$_\n";
my #folders = split('/', $_);
if ($folders[4] eq '..txt' or $folders[4] eq '...txt') {
print "$folders[4] ..txt\n";
}
foreach (#folders) {
print "$_\n";
}
open(FH, '<', $_);
my $value;
while (my $line = <FH>) {
if ($line =~ m/some_pattern/) {
($value) = $line =~ /\[(.+?)\]/;
($value) =~ s/\s*$//;
print "ident'$value'\n";
my $new_dir = "$folders[0]/$folders[1]/$folders[2]/$value/$folders[3]/$folders[4]";
print "making $folders[0]/$folders[1]/$folders[2]/$value/$folders[3]\n";
print "file is $folders[4]\n";
my $new_over_dir = "$folders[0]/$folders[1]/$value/$folders[2]/$folders[3]";
mkdir $new_over_dir or die "Can't make it $!";
print "going to swap\n '$_'\n for\n '$new_dir'\n";
move($_, $new_dir) or die "Can't $!";
}
}
}
}
}
It's saying
Can't make it No such file or directory at foo.pl line 57, <FH> line 82.
Why is it saying that it won't make a file that doesn't exist?
A while later: here is my final script:
use File::Find;
use strict;
use warnings;
use File::Copy;
my $DATA = "D:/DATA/DATA_x/*";
my #dirs = grep { -d } glob $DATA;
foreach (#dirs) {
if ($_ =~ m/HHH/) {
my $value;
my #folders;
print "$_\n";
my $file = "$_/*";
my #files = grep { -f } glob $file;
foreach (#files) {
print "file $_\n";
}
foreach (#files) {
print "\t$_\n";
#folders = split('/', $_);
if ($folders[4] eq '..txt' or $folders[4] eq '...txt') {
print "$folders[4] ..txt\n";
}
foreach (#folders) {
print "$_\n";
}
open(FH, '<', $_);
while (my $line = <FH>) {
if ($line =~ m/some_pattern/) {
($value) = $line =~ /\[(.+?)\]/;
($value) =~ s/\s*$//;
print "ident'$value'\n";
}
}
}
if($value){
print "value $value\n";
my $dir1 = "/$folders[1]/$folders[2]/$folders[3]/$folders[4]/$folders[5]";
my $dir2 = "/$folders[1]/$folders[2]/$folders[3]/$folders[4]/$value";
system("cp -r $dir1 $dir2");
}
}
}
}
This works. It looks like part of my problem from before was that I was trying to run this on a directory in my D: drive--when I moved it to the C: drive, it worked fine without any permissions errors or anything. I did try to implement something with Path::Tiny, but this script was so close to being functional (and it was functional in a Unix environment), that I decided to just complete it.
You really should read the Path::Tiny doccu. It probably contains everything you need.
Some starting points, without error handling and so on...
use strict;
use warnings;
use Path::Tiny;
my $start=path('D:/DATA/DATA_x');
my $iter = path($start)->iterator({recurse => 1});
while ( $curr = $iter->() ) {
#select here the needed files - add more conditions if need
next if $curr->is_dir; #skip directories
next if $curr =~ m/HHH.*\.{2,3}txt$/; #skip ...?txt
#say "$curr";
my $content = $curr->slurp;
if( $content =~ m/some_pattern/ ) {
#do something wih the file
say "doing something with $curr";
my $newfilename = path("insert what you need here"); #create the needed new path for the file ..
path($newfilename->dirname)->mkpath; #make directories
$curr->move($newfilename); #move the file
}
}
Are you sure of the directory path you are trying to create. The mkdir call might be failing if some of the intermediate directories doesn't exist. If your code is robust to ensure that
the variable $new_over_dir contains the directory path you have to create, you can use method make_path from perl module File::Path to create the new directory, instead of 'mkdir'.
From the documentation of make_path:
The make_path function creates the given directories if they don't
exists before, much like the Unix command mkdir -p.

perl + read multiple csv files + manipulate files + provide output_files

Apologies if this is a bit long winded, bu i really appreciate an answer here as i am having difficulty getting this to work.
Building on from this question here, i have this script that works on a csv file(orig.csv) and provides a csv file that i want(format.csv). What I want is to make this more generic and accept any number of '.csv' files and provide a 'output_csv' for each inputed file. Can anyone help?
#!/usr/bin/perl
use strict;
use warnings;
open my $orig_fh, '<', 'orig.csv' or die $!;
open my $format_fh, '>', 'format.csv' or die $!;
print $format_fh scalar <$orig_fh>; # Copy header line
my %data;
my #labels;
while (<$orig_fh>) {
chomp;
my #fields = split /,/, $_, -1;
my ($label, $max_val) = #fields[1,12];
if ( exists $data{$label} ) {
my $prev_max_val = $data{$label}[12] || 0;
$data{$label} = \#fields if $max_val and $max_val > $prev_max_val;
}
else {
$data{$label} = \#fields;
push #labels, $label;
}
}
for my $label (#labels) {
print $format_fh join(',', #{ $data{$label} }), "\n";
}
i was hoping to use this script from here but am having great difficulty putting the 2 together:
#!/usr/bin/perl
use strict;
use warnings;
#If you want to open a new output file for every input file
#Do it in your loop, not here.
#my $outfile = "KAC.pdb";
#open( my $fh, '>>', $outfile );
opendir( DIR, "/data/tmp" ) or die "$!";
my #files = readdir(DIR);
closedir DIR;
foreach my $file (#files) {
open( FH, "/data/tmp/$file" ) or die "$!";
my $outfile = "output_$file"; #Add a prefix (anything, doesn't have to say 'output')
open(my $fh, '>', $outfile);
while (<FH>) {
my ($line) = $_;
chomp($line);
if ( $line =~ m/KAC 50/ ) {
print $fh $_;
}
}
close($fh);
}
the script reads all the files in the directory and finds the line with this string 'KAC 50' and then appends that line to an output_$file for that inputfile. so there will be 1 output_$file for every inputfile that is read
issues with this script that I have noted and was looking to fix:
- it reads the '.' and '..' files in the directory and produces a
'output_.' and 'output_..' file
- it will also do the same with this script file.
I was also trying to make it dynamic by getting this script to work in any directory it is run in by adding this code:
use Cwd qw();
my $path = Cwd::cwd();
print "$path\n";
and
opendir( DIR, $path ) or die "$!"; # open the current directory
open( FH, "$path/$file" ) or die "$!"; #open the file
**EDIT::I have tried combining the versions but am getting errors.Advise greatly appreciated*
UserName#wabcl13 ~/Perl
$ perl formatfile_QforStackOverflow.pl
Parentheses missing around "my" list at formatfile_QforStackOverflow.pl line 13.
source dir -> /home/UserName/Perl
Can't use string ("/home/UserName/Perl/format_or"...) as a symbol ref while "strict refs" in use at formatfile_QforStackOverflow.pl line 28.
combined code::
use strict;
use warnings;
use autodie; # this is used for the multiple files part...
#START::Getting current working directory
use Cwd qw();
my $source_dir = Cwd::cwd();
#END::Getting current working directory
print "source dir -> $source_dir\n";
my $output_prefix = 'format_';
opendir my $dh, $source_dir; #Changing this to work on current directory; changing back
for my $file (readdir($dh)) {
next if $file !~ /\.csv$/;
next if $file =~ /^\Q$output_prefix\E/;
my $orig_file = "$source_dir/$file";
my $format_file = "$source_dir/$output_prefix$file";
# .... old processing code here ...
## Start:: This part works on one file edited for this script ##
#open my $orig_fh, '<', 'orig.csv' or die $!; #line 14 and 15 above already do this!!
#open my $format_fh, '>', 'format.csv' or die $!;
#print $format_fh scalar <$orig_fh>; # Copy header line #orig needs changeing
print $format_file scalar <$orig_file>; # Copy header line
my %data;
my #labels;
#while (<$orig_fh>) { #orig needs changing
while (<$orig_file>) {
chomp;
my #fields = split /,/, $_, -1;
my ($label, $max_val) = #fields[1,12];
if ( exists $data{$label} ) {
my $prev_max_val = $data{$label}[12] || 0;
$data{$label} = \#fields if $max_val and $max_val > $prev_max_val;
}
else {
$data{$label} = \#fields;
push #labels, $label;
}
}
for my $label (#labels) {
#print $format_fh join(',', #{ $data{$label} }), "\n"; #orig needs changing
print $format_file join(',', #{ $data{$label} }), "\n";
}
## END:: This part works on one file edited for this script ##
}
How do you plan on inputting the list of files to process and their preferred output destination? Maybe just have a fixed directory that you want to process all the cvs files, and prefix the result.
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
my $source_dir = '/some/dir/with/cvs/files';
my $output_prefix = 'format_';
opendir my $dh, $source_dir;
for my $file (readdir($dh)) {
next if $file !~ /\.csv$/;
next if $file =~ /^\Q$output_prefix\E/;
my $orig_file = "$source_dir/$file";
my $format_file = "$source_dir/$output_prefix$file";
.... old processing code here ...
}
Alternatively, you could just have an output directory instead of prefixing the files. Either way, this should get you on your way.

Perl loop through text files in a directory, locate file based on file name and print content

How do i loop through files in a directory, locate file based on file name and print file content?
Please see below code:
files in directory:
1234.txt
345.txt
234.txt
Code:
opendir (DIR, "LOCATION")|| die "cant open directory\n";
my #DATA = grep {(!/^\./)} readdir (DIR);
while ( my $file = shift #DATA) {
open FILE, "LOCATION";
while (FILE){
if ($file eq "235") {
print $_;
}
}
}
This should do (untested):
opendir( DIR, "/path/to/dir" );
while ( my $entry = readdir( DIR ) ) {
if ( $entry =~ /^$filenameImLookingFor$/ ) {
open( FILE, "$entry/$filenameImLookingFor" );
my #lines = <FILE>;
close( FILE );
print( join( '', #lines );
}
}
closedir( DIR );
The code in your question:
opendir (DIR, "LOCATION")|| die "cant open directory\n";
my #DATA = grep {(!/^\./)} readdir (DIR);
while ( my $file = shift #DATA) {
open FILE, "LOCATION";
while (FILE){
if ($file eq "235") {
print $_;
}
}
}
Will do this:
First it will handily find all files in directory "LOCATION" that do not begin with a period. Then it will iterate in a rather odd loop over each file name. The normal version of this loop would be:
for my $file (#DATA)
Then it will attempt to open the directory "LOCATION" again. This will likely fail, because "LOCATION" is a directory. Since you do not check the return value with die, this error will be silent.
What you probably want is to use
if ($file eq "235.txt") {
open my $fh, "<", $file or die $!;
print <$fh>;
}
This part:
while (FILE)
Is not actually checking the return value of readline(), it is checking whether the file handle is returning a true value. As near as I can tell on my system, it does return a true value even if the open failed. Which means of course that the loop will run indefinitely. What you probably meant was
while (<FILE>)
However, as explained earlier, this will only result in the error "readline() on unopened file handle FILE" since the open statement cannot open a directory.
Your check
if ($file eq "235")
Will never be true, since you said your file names had a .txt extension. You might instead do
if ($file eq "235.txt")
Which should work.
If you wanted to be clever, you could include your check directly in the grep:
my #files = grep { $_ eq "235.txt" } readdir DIR;
And since perl can use the <> diamond operator to print files listed in the #ARGV array, you can even do this
#ARGV = grep { $_ eq "235.txt" } #ARGV;
print <>;
Assuming you call the script with:
perl script.pl dir/*.txt
This is, of course, just the long version of doing:
perl -pe0 235.txt
Which is the long version of
cat 235.txt
So, I get the feeling you are trying to do something other than what your code implies.

I can't output properly

I'm trying to print a character from a file each time I get a char as input.
My problem is that it prints the whole line. I know it's a logic problem, I just can't figure out how to fix it.
use Term::ReadKey;
$inputFile = "input.txt";
open IN, $inputFile or die "I can't open the file :$ \n";
ReadMode("cbreak");
while (<IN>) {
$line = <IN>;
$char = ReadKey();
foreach $i (split //, $line) {
print "$i" if ($char == 0);
}
}
Move the ReadKey call into the foreach loop.
use strictures;
use autodie qw(:all);
use Term::ReadKey qw(ReadKey ReadMode);
my $inputFile = 'input.txt';
open my $in, '<', $inputFile;
ReadMode('cbreak');
while (my $line = <$in>) {
foreach my $i (split //, $line) {
my $char = ReadKey;
print $i;
}
}
END { ReadMode('restore') }
Your original code has 3 problems:
You only read the character once (outside the for loop)
You read 1 line from input file when testing while (<IN>) { (LOSING that line!) and then another in $line = <IN>; - therefore, only read even #d lines in your logic
print "$i" prints 1 line with no newline, therefore, you don't see characters separated
My scrip reads all the files in a directory, puts then in a list, chooses a random file from the given list.
After that, each time it gets an input char from the user, it prints a char from the file.
#!C:\perl\perl\bin\perl
use Term::ReadKey qw(ReadKey ReadMode);
use autodie qw(:all);
use IO::Handle qw();
use Fatal qw( open );
STDOUT->autoflush(1);
my $directory = "codes"; #directory's name
opendir (DIR, $directory) or die "I can't open the directory $directory :$ \n"; #open the dir
my #allFiles; #array of all the files
while (my $file = readdir(DIR)) { #read each file from the directory
next if ($file =~ m/^\./); #exclude it if it starts with '.'
push(#allFiles, $file); #add file to the array
}
closedir(DIR); #close the input directory
my $filesNr = scalar(grep {defined $_} #allFiles); #get the size of the files array
my $randomNr = int(rand($filesNr)); #generate a random number in the given range (size of array)
$file = #allFiles[$randomNr]; #get the file at given index
open IN, $file or die "I can't open the file :$ \n"; #read the given file
ReadMode('cbreak'); #don't print the user's input
while (my $line = <IN>) { #read each line from file
foreach my $i (split //, $line) { #split the line in characters (including \n & \t)
print "$i" if ReadKey(); #if keys are pressed, print the inexed char
}
}
END {
ReadMode('restore') #deactivate 'cbreak' read mode
}