Perl - Read multiple files and read line by line of the text file - perl

I am trying to read multiple .txt files in a folder. Each file should be read line by line, however, I failed to read multiple .txt files by using glob. Any advice on my code?
my %data;
#FILES = glob("*.txt");
$EmailMsg .= "EG. Folder(week) = Folder(CW01) --CW01 = Week 1 -- Number is week\n ";
$EmailMsg .= "=======================================================================================================\n";
# Try to Loop multiple files here
foreach my $file (#FILES) {
local $/ = undef;
open my $fh, '<', $file;
$data{$file} = <$fh>;
# Read the file one line at a time.
while (my $line = <$fh>) {
chomp $line;
$line =~ s/^\s+//;
$line =~ s/\s+$//;
my ($name, $date, $week) = split /\:/, $line;
if ($name eq "NoneFolder") {
$EmailMsg .= "Folder ($week) - No Folder created on the FTP! Failed to open folder!\n";
}
if ($name eq "EmptyFiles") {
$EmailMsg .= "Folder ($week) - No Files insides the folder! Failed download files!\n";
}
}
}
$EmailMsg .= "=======================================================================================================\n";
$EmailMsg .= "Please note that if you receive this email means that the script is running fine just that no folder is created or no files inside the folder for the week on the FTP.\n";
# close the file.
#close <$fh>;
Currently output:
EG. Folder(week) = Folder(CW01) --CW01 = Week 1 -- Number is week
=======================================================================================================
=======================================================================================================
Please note that if you receive this email means that the script is running fine just that no folder is created or no files inside the folder for the week on the FTP.
It failed to get any .txt files.

You are trying to read each file twice: firstly into the hash %data and then again line by line.
Once you have reached end of file, you have to either reopen the file or use seek to move the read pointer back to the beginning.
You also need to set $/ back to its original value, otherwise your loop will read the entire file instead of one line at a time.
It's not clear whether you really need the second copy of the file data in the hash, but you can avoid having to reset $/ by putting the change within a block, like this
open my $fh, '<', $file;
$data{$file} = do {
local $/ = undef;
<$fh>;
};
and then reset the file pointer to the start again before the while loop.
seek $fh, 0, 0;

#!/usr/bin/perl
use strict;
use warnings FATAL => 'all';
my #files=('Read a file.pl','Read a single text file.pl','Read only one
file.pl','Read the file using while.pl','Reading the file.pl');
foreach my $i(#files) {
open(FH, "<$i");
{
while (my $row = <FH>) {
chomp $row;
print "$row\n";
}
}
}

The file globbing works for me. You might want to specify scope for your #FILES variable and check that there actually are files matching the path you have specified,
#!/bin/env perl
use strict;
use warnings;
## glob on all files in home directory
## see: http://perldoc.perl.org/File/Glob.html
use File::Glob ':globally';
my #configs = <~myname/project/etc/*.cfg>;
foreach my $fn (#configs) {
print "file $fn\n";
}
your code,
my %data;
#here are some .c files,
my #FILES = glob("../*.c");
foreach my $fn (#FILES) {
print "file $fn\n";
}
exit;

This way catches more garbage for about the same amount of code.
my $PATH = shift #ARGV ;
chomp $PATH ;
opendir(TXTFILE,$PATH) || die ("failed to opendir: $PATH") ;
my #file = readdir TXTFILE ;
closedir(TXTFILE) ;
foreach(#file) { #
next unless ($_ =~ /\.txt$/i) ; # Only get .txt files
$PATH =~ s/\/$//g ; $PATH =~ s/$/\// ; # Uniform trailing slash
my $thisfile = $PATH . $_ ; # now a fully qualified filename
unless (open(THISFILE,$thisfile)) { # Notify on busted files.
warn ("$thisfile failed to open") ;
next ;
}
while(<THISFILE>) {
# etc. etc.
}
close(THISFILE) ;
}

Related

can't loop through the whole thing to start at the beginning after it shows your results

I am really new in perl and I am writing this program that gives you the unique words that are in a text file. however I don't know how to make it loop to ask the user for another file or to quit the program altogether.
I tried to put my whole code under a do until loop and it did not work
use 5.18.0;
use warnings;
use strict;
print "Enter the name of the file: ";
my %count;
my $userinput = <>; #the name of the text file the user wants to read
chomp($userinput); #take out the new line comand
my $linenumb = $ARGV[1];
my $uniqcount = 0;
#opens the file if is readeable
open(FH, '<:encoding(UTF-8)', $userinput) or die "Could not open file '$userinput' $!";
print "Summary of file '$userinput': \n";
my ($lines, $wordnumber, $total) = (0, 0, 0);
my #words = ();
my $count =1;
while (my $line = <FH>) {
$lines++;
my #words = split (" ", $line);
$wordnumber = #words;
print "\n Line $lines : $wordnumber ";
$total = $total+$wordnumber;
$wordnumber++;
}
print "\nTotal no. of words in file are $total \n";
#my #uniq = uniq #words;
#print "Unique Names: " .scalar #uniq . "\n";
close(FH);
It's often a good idea to put complicated pieces of your code into subroutines so that you can forget (temporarily) how the details work and concentrate on the bigger picture.
I'd suggest that you have two obvious subroutines here that might be called get_user_input() and process_file(). Putting the code into subroutines might look like this:
sub get_user_input {
print "Enter the name of the file: ";
my $userinput = <>; #the name of the text file the user wants to read
chomp($userinput); #take out the new line comand
return $userinput;
}
sub process_file {
my ($file) = #_;
#opens the file if is readeable
# Note: Changed to using a lexical filehandle.
# This will automatically be closed when the
# lexical variable goes out of scope (i.e. at
# the end of this subroutine).
open(my $fh, '<:encoding(UTF-8)', $file)
or die "Could not open file '$file' $!";
print "Summary of file '$file': \n";
# Removed $lines variable. We'll use the built-in
# variable $. instead.
# Moved declaration of $wordnumber inside the loop.
# Removed #words and $count variables that aren't used.
my $total = 0;
# Removed $line variable. We'll use $_ instead.
while (<$fh>) {
# With no arguments, split() defaults to
# behaving as split ' ', $_.
# When assigned to a scalar, split() returns
# the number of elements in the split list
# (which is what we want here - we never actually
# use the list of words).
my $wordnumber = split;
print "\n Line $. : $wordnumber ";
# $x += $y is a shortcut for $x = $x + $y.
$total += $wordnumber;
$wordnumber++;
}
print "\nTotal no. of words in file are $total \n";
}
And then you can plug them together with code something like this:
# Get the first filename from the user
my $filename = get_user_input();
# While the user hasn't typed 'q' to quit
while ($filename ne 'q') {
# Process the file
process_file($filename);
# Get another filename from the user
$filename = get_user_input();
}
Update: I've cleaned up the process_file() subroutine a bit and added comments about the changes I've made.
Wrap everything in a neverending loop and conditionally jump out of it.
while () {
my $prompt = …
last if $prompt eq 'quit';
… # file handling goes here
}

Data driven perl script

I want to list file n folder in directory. Here are the list of the file in this directory.
Output1.sv
Output2.sv
Folder1
Folder2
file_a
file_b
file_c.sv
But some of them, i don't want it to be listed. The list of not included file, I list in input.txt like below. Note:some of them is file and some of them is folder
NOT_INCLUDED=file_a
NOT_INCLUDED=file_b
NOT_INCLUDED=file_c.sv
Here is the code.
#!/usr/intel/perl
use strict;
use warnings;
my $input_file = "INPUT.txt";
open ( OUTPUT, ">OUTPUT.txt" );
file_in_directory();
close OUTPUT;
sub file_in_directory {
my $path = "experiment/";
my #unsort_output;
my #not_included;
open ( INFILE, "<", $input_file);
while (<INFILE>){
if ( $_ =~ /NOT_INCLUDED/){
my #file = $_;
foreach my $file (#file) {
$file =~ s/NOT_INCLUDED=//;
push #not_included, $file;
}
}
}
close INFILE;
opendir ( DIR, $path ) || die "Error in opening dir $path\n";
while ( my $filelist = readdir (DIR) ) {
chomp $filelist;
next if ( $filelist =~ m/\.list$/ );
next if ( $filelist =~ m/\.swp$/ );
next if ( $filelist =~ s/\.//g);
foreach $_ (#not_included){
chomp $_;
my $not_included = "$_";
if ( $filelist eq $not_included ){
next;
}
push #unsort_output, $filelist;
}
closedir(DIR);
my #output = sort #unsort_output;
print OUTPUT #output;
}
The output that I want is to list all the file in that directory except the file list in input.txt 'NOT_INCLUDED'.
Output1.sv
Output2.sv
Folder1
Folder2
But the output that i get seem still included that unwanted file.
This part of the code makes no sense:
while ( my $filelist = readdir (DIR) ) {
...
foreach $_ (#not_included){
chomp $_;
my $not_included = "$_";
if ( $filelist eq $not_included ){
next;
} # (1)
push #unsort_output, $filelist; # (2)
}
This code contains three opening braces ({) but only two closing braces (}). If you try to run your code as-is, it fails with a syntax error.
The push line (marked (2)) is part of the foreach loop, but indented as if it were outside. Either it should be indented more (to line up with (1)), or you need to add a } before it. Neither alternative makes much sense:
If push is outside of the foreach loop, then the next statement (and the whole foreach loop) has no effect. It could just be deleted.
If push is inside the foreach loop, then every directory entry ($filelist) will be pushed multiple times, one for each line in #not_included (except for the names listed somewhere in #not_included; those will be pushed one time less).
There are several other problems. For example:
$filelist =~ s/\.//g removes all dots from the file name, transforming e.g. file_c.sv into file_csv. That means it will never match NOT_INCLUDED=file_c.sv in your input file.
Worse, the next if s/// part means the loop skips all files whose names contain dots, such as Output1.sv or Output2.sv.
Results are printed without separators, so you'll get something like
Folder1Folder1Folder1Folder2Folder2Folder2file_afile_afile_bfile_b in OUTPUT.txt.
Global variables are used for no reason, e.g. INFILE and DIR.
Here is how I would structure the code:
#!/usr/intel/perl
use strict;
use warnings;
my $input_file = 'INPUT.txt';
my %is_blacklisted;
{
open my $fh, '<', $input_file or die "$0: $input_file: $!\n";
while (my $line = readline $fh) {
chomp $line;
if ($line =~ s!\ANOT_INCLUDED=!!) {
$is_blacklisted{$line} = 1;
}
}
}
my $path = 'experiment';
my #results;
{
opendir my $dh, $path or die "$0: $path: $!\n";
while (my $entry = readdir $dh) {
next
if $entry eq '.' || $entry eq '..'
|| $entry =~ /\.list\z/
|| $entry =~ /\.swp\z/
|| $is_blacklisted{$entry};
push #results, $entry;
}
}
#results = sort #results;
my $output_file = 'OUTPUT.txt';
{
open my $fh, '>', $output_file or die "$0: $output_file: $!\n";
for my $result (#results) {
print $fh "$result\n";
}
}
The contents of INPUT.txt (more specifically, the parts after NOT_INCLUDED=) are read into a hash (%is_blacklisted). This allows easy lookup of entries.
Then we process the directory entries. We skip over . and .. (I assume you don't want those) as well as all files ending with *.list or *.swp (that was in your original code). We also skip any file that is blacklisted, i.e. that was specified as excluded in INPUT.txt. The remaining entries are collected in #results.
We sort our results and write them to OUTPUT.txt, one entry per line.
Not deviating too much from your code, here is the solution. Please find the comments:
#!/usr/intel/perl
use strict;
use warnings;
my $input_file = "INPUT.txt";
open ( OUTPUT, ">OUTPUT.txt" );
file_in_directory();
close OUTPUT;
sub file_in_directory {
my $path = "experiment/";
my #unsort_output;
my %not_included; # creating hash map insted of array for cleaner and faster implementaion.
open ( INFILE, "<", $input_file);
while (my $file = <INFILE>) {
if ($file =~ /NOT_INCLUDED/) {
$file =~ s/NOT_INCLUDED=//;
$not_included{$file}++; # create a quick hash map of (filename => 1, filename2 => 1)
}
}
close INFILE;
opendir ( DIR, $path ) || die "Error in opening dir $path\n";
while ( my $filelist = readdir (DIR) ) {
next if $filelist =~ /^\.\.?$/xms; # discard . and .. files
chomp $filelist;
next if ( $filelist =~ m/\.list$/ );
next if ( $filelist =~ m/\.swp$/ );
next if ( $filelist =~ s/\.//g);
if (defined $not_included{$filelist}) {
next;
}
else {
push #unsort_output, $filelist;
}
}
closedir(DIR); # earlier the closedir was inside of while loop. Which is wrong.
my #output = sort #unsort_output;
print OUTPUT join "\n", #output;
}

perl not able to delete a file using Unlink

I am using a perl script that takes directory name as input from user and searches files in it. After searching file it reads the contents of file. If file contents contain a word "cricket" then using unlink function I should be able to delete the file. But using unlink the file that contains the word "cricket" still exists in the directory after execution of the code. Please help. My code is:
use strict;
use warnings;
use File::Basename;
print "enter a directory name\n";
my $dir = <>;
print "you have entered $dir \n";
chomp($dir);
opendir DIR, $dir or die "cannot open directory $!";
while (my $file = readdir(DIR)) {
next if ($file =~ m/^\./);
my $filepath = "${dir}${file}";
print "$filepath\n";
print " $file \n";
open(my $fh, '<', $filepath) or die "unable to open the $file $!";
my $count = 0;
while (my $row = <$fh>) {
chomp $row;
if ($row =~ /cricket/) {
$count++;
}
}
print "$count";
if ($count == 0) {
chomp($filepath);
unlink $filepath;
print " $filepath deleted";
}
}
By your test if($count==0) {...} you'll only delete files if they don't contain "cricket". It should work as you describe if you change it to if($count) {...}.
Additionally you're creating the filepath by concatenating the dir and file names in a manner that will only work if the dir name the user entered includes a trailing slash (${dir}${file}): this would be less error-prone as $dir/$file, or, if you wanted to go to town:
use File::Spec;
File::Spec::catfile($dir, $file);
Additionally, as the comments point out, you're not closing the open file handle, whether or not you try to delete it. This is bad practice, however, on Linux at least it should still work. Use close($fh) before your deletion test.
Note also that "cricket" is case-sensitive so files with "Cricket" won't be deleted. Use $row =~ /cricket/i for case-insensitive search.

I can't output properly

I'm trying to print a character from a file each time I get a char as input.
My problem is that it prints the whole line. I know it's a logic problem, I just can't figure out how to fix it.
use Term::ReadKey;
$inputFile = "input.txt";
open IN, $inputFile or die "I can't open the file :$ \n";
ReadMode("cbreak");
while (<IN>) {
$line = <IN>;
$char = ReadKey();
foreach $i (split //, $line) {
print "$i" if ($char == 0);
}
}
Move the ReadKey call into the foreach loop.
use strictures;
use autodie qw(:all);
use Term::ReadKey qw(ReadKey ReadMode);
my $inputFile = 'input.txt';
open my $in, '<', $inputFile;
ReadMode('cbreak');
while (my $line = <$in>) {
foreach my $i (split //, $line) {
my $char = ReadKey;
print $i;
}
}
END { ReadMode('restore') }
Your original code has 3 problems:
You only read the character once (outside the for loop)
You read 1 line from input file when testing while (<IN>) { (LOSING that line!) and then another in $line = <IN>; - therefore, only read even #d lines in your logic
print "$i" prints 1 line with no newline, therefore, you don't see characters separated
My scrip reads all the files in a directory, puts then in a list, chooses a random file from the given list.
After that, each time it gets an input char from the user, it prints a char from the file.
#!C:\perl\perl\bin\perl
use Term::ReadKey qw(ReadKey ReadMode);
use autodie qw(:all);
use IO::Handle qw();
use Fatal qw( open );
STDOUT->autoflush(1);
my $directory = "codes"; #directory's name
opendir (DIR, $directory) or die "I can't open the directory $directory :$ \n"; #open the dir
my #allFiles; #array of all the files
while (my $file = readdir(DIR)) { #read each file from the directory
next if ($file =~ m/^\./); #exclude it if it starts with '.'
push(#allFiles, $file); #add file to the array
}
closedir(DIR); #close the input directory
my $filesNr = scalar(grep {defined $_} #allFiles); #get the size of the files array
my $randomNr = int(rand($filesNr)); #generate a random number in the given range (size of array)
$file = #allFiles[$randomNr]; #get the file at given index
open IN, $file or die "I can't open the file :$ \n"; #read the given file
ReadMode('cbreak'); #don't print the user's input
while (my $line = <IN>) { #read each line from file
foreach my $i (split //, $line) { #split the line in characters (including \n & \t)
print "$i" if ReadKey(); #if keys are pressed, print the inexed char
}
}
END {
ReadMode('restore') #deactivate 'cbreak' read mode
}

help merging perl code routines together for file processing

I need some perl help in putting these (2) processes/code to work together. I was able to get them working individually to test, but I need help bringing them together especially with using the loop constructs. I'm not sure if I should go with foreach..anyways the code is below.
Also, any best practices would be great too as I'm learning this language. Thanks for your help.
Here's the process flow I am looking for:
read a directory
look for a particular file
use the file name to strip out some key information to create a newly processed file
process the input file
create the newly processed file for each input file read (if i read in 10, I create 10 new files)
Part 1:
my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
next if ($file =~ /^\.+$/);
#Get filename attributes
if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
print "$1\n";
print "$2\n";
print "$3\n";
}
print "$file\n";
}
Part 2:
use strict;
use Digest::MD5 qw(md5_hex);
#Create new file
open (NEWFILE, ">/backups/processed/foo$1.name.$2-foo_p$3.out") || die "cannot create file";
my $data = '';
my $line1 = <>;
chomp $line1;
my #heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ( "^A", "^E", "^D");
while (<>)
{
my $digest = md5_hex($data);
chomp;
my (#values) = split /,/;
my $extra = "__mykey__$sep1$digest$sep2" ;
$extra .= "$heading[$_]$sep1$values[$_]$sep2" for (0..scalar(#values));
$data .= "$extra$eorec";
print NEWFILE "$data";
}
#print $data;
close (NEWFILE);
You are using an old-style of Perl programming. I recommend you to use functions and CPAN modules (http://search.cpan.org). Perl pseudocode:
use Modern::Perl;
# use...
sub get_input_files {
# return an array of files (#)
}
sub extract_file_info {
# takes the file name and returs an array of values (filename attrs)
}
sub process_file {
# reads the input file, takes the previous attribs and build the output file
}
my #ifiles = get_input_files;
foreach my $ifile(#ifiles) {
my #attrs = extract_file_info($ifile);
process_file($ifile, #attrs);
}
Hope it helps
I've bashed your two code fragments together (making the second a sub that the first calls for each matching file) and, if I understood your description of the objective correctly, this should do what you want. Comments on style and syntax are inline:
#!/usr/bin/env perl
# - Never forget these!
use strict;
use warnings;
use Digest::MD5 qw(md5_hex);
my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
# Parens on postfix "if" are optional; I prefer to omit them
next if $file =~ /^\.+$/;
if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
process_file($file, $1, $2, $3);
}
print "$file\n";
}
sub process_file {
my ($orig_name, $foo_x, $name_x, $p_x) = #_;
my $new_name = "/backups/processed/foo$foo_x.name.$name_x-foo_p$p_x.out";
# - From your description of the task, it sounds like we actually want to
# read from the found file, not from <>, so opening it here to read
# - Better to use lexical ("my") filehandle and three-arg form of open
# - "or" has lower operator precedence than "||", so less chance of
# things being grouped in the wrong order (though either works here)
# - Including $! in the error will tell why the file open failed
open my $in_fh, '<', $orig_name or die "cannot read $orig_name: $!";
open(my $out_fh, '>', $new_name) or die "cannot create $new_name: $!";
my $data = '';
my $line1 = <$in_fh>;
chomp $line1;
my #heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ("^A", "^E", "^D");
while (<$in_fh>) {
chomp;
my $digest = md5_hex($data);
my (#values) = split /,/;
my $extra = "__mykey__$sep1$digest$sep2";
$extra .= "$heading[$_]$sep1$values[$_]$sep2"
for (0 .. scalar(#values));
# - Useless use of double quotes removed on next two lines
$data .= $extra . $eorec;
#print $out_fh $data;
}
# - Moved print to output file to here (where it will print the complete
# output all at once) rather than within the loop (where it will print
# all previous lines each time a new line is read in) to prevent
# duplicate output records. This could also be achieved by printing
# $extra inside the loop. Printing $data at the end will be slightly
# faster, but requires more memory; printing $extra within the loop and
# getting rid of $data entirely would require less memory, so that may
# be the better option if you find yourself needing to read huge input
# files.
print $out_fh $data;
# - $in_fh and $out_fh will be closed automatically when it goes out of
# scope at the end of the block/sub, so there's no real point to
# explicitly closing it unless you're going to check whether the close
# succeeded or failed (which can happen in odd cases usually involving
# full or failing disks when writing; I'm not aware of any way that
# closing a file open for reading can fail, so that's just being left
# implicit)
close $out_fh or die "Failed to close file: $!";
}
Disclaimer: perl -c reports that this code is syntactically valid, but it is otherwise untested.