help merging perl code routines together for file processing - perl

I need some perl help in putting these (2) processes/code to work together. I was able to get them working individually to test, but I need help bringing them together especially with using the loop constructs. I'm not sure if I should go with foreach..anyways the code is below.
Also, any best practices would be great too as I'm learning this language. Thanks for your help.
Here's the process flow I am looking for:
read a directory
look for a particular file
use the file name to strip out some key information to create a newly processed file
process the input file
create the newly processed file for each input file read (if i read in 10, I create 10 new files)
Part 1:
my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
next if ($file =~ /^\.+$/);
#Get filename attributes
if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
print "$1\n";
print "$2\n";
print "$3\n";
}
print "$file\n";
}
Part 2:
use strict;
use Digest::MD5 qw(md5_hex);
#Create new file
open (NEWFILE, ">/backups/processed/foo$1.name.$2-foo_p$3.out") || die "cannot create file";
my $data = '';
my $line1 = <>;
chomp $line1;
my #heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ( "^A", "^E", "^D");
while (<>)
{
my $digest = md5_hex($data);
chomp;
my (#values) = split /,/;
my $extra = "__mykey__$sep1$digest$sep2" ;
$extra .= "$heading[$_]$sep1$values[$_]$sep2" for (0..scalar(#values));
$data .= "$extra$eorec";
print NEWFILE "$data";
}
#print $data;
close (NEWFILE);

You are using an old-style of Perl programming. I recommend you to use functions and CPAN modules (http://search.cpan.org). Perl pseudocode:
use Modern::Perl;
# use...
sub get_input_files {
# return an array of files (#)
}
sub extract_file_info {
# takes the file name and returs an array of values (filename attrs)
}
sub process_file {
# reads the input file, takes the previous attribs and build the output file
}
my #ifiles = get_input_files;
foreach my $ifile(#ifiles) {
my #attrs = extract_file_info($ifile);
process_file($ifile, #attrs);
}
Hope it helps

I've bashed your two code fragments together (making the second a sub that the first calls for each matching file) and, if I understood your description of the objective correctly, this should do what you want. Comments on style and syntax are inline:
#!/usr/bin/env perl
# - Never forget these!
use strict;
use warnings;
use Digest::MD5 qw(md5_hex);
my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
# Parens on postfix "if" are optional; I prefer to omit them
next if $file =~ /^\.+$/;
if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
process_file($file, $1, $2, $3);
}
print "$file\n";
}
sub process_file {
my ($orig_name, $foo_x, $name_x, $p_x) = #_;
my $new_name = "/backups/processed/foo$foo_x.name.$name_x-foo_p$p_x.out";
# - From your description of the task, it sounds like we actually want to
# read from the found file, not from <>, so opening it here to read
# - Better to use lexical ("my") filehandle and three-arg form of open
# - "or" has lower operator precedence than "||", so less chance of
# things being grouped in the wrong order (though either works here)
# - Including $! in the error will tell why the file open failed
open my $in_fh, '<', $orig_name or die "cannot read $orig_name: $!";
open(my $out_fh, '>', $new_name) or die "cannot create $new_name: $!";
my $data = '';
my $line1 = <$in_fh>;
chomp $line1;
my #heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ("^A", "^E", "^D");
while (<$in_fh>) {
chomp;
my $digest = md5_hex($data);
my (#values) = split /,/;
my $extra = "__mykey__$sep1$digest$sep2";
$extra .= "$heading[$_]$sep1$values[$_]$sep2"
for (0 .. scalar(#values));
# - Useless use of double quotes removed on next two lines
$data .= $extra . $eorec;
#print $out_fh $data;
}
# - Moved print to output file to here (where it will print the complete
# output all at once) rather than within the loop (where it will print
# all previous lines each time a new line is read in) to prevent
# duplicate output records. This could also be achieved by printing
# $extra inside the loop. Printing $data at the end will be slightly
# faster, but requires more memory; printing $extra within the loop and
# getting rid of $data entirely would require less memory, so that may
# be the better option if you find yourself needing to read huge input
# files.
print $out_fh $data;
# - $in_fh and $out_fh will be closed automatically when it goes out of
# scope at the end of the block/sub, so there's no real point to
# explicitly closing it unless you're going to check whether the close
# succeeded or failed (which can happen in odd cases usually involving
# full or failing disks when writing; I'm not aware of any way that
# closing a file open for reading can fail, so that's just being left
# implicit)
close $out_fh or die "Failed to close file: $!";
}
Disclaimer: perl -c reports that this code is syntactically valid, but it is otherwise untested.

Related

how to combine the code to make the output is on the same line?

Can you help me to combine both of these progeam to display the output in a row with two columns? The first column is for $1 and the second column is $2.
Kindly help me to solve this. Thank you :)
This is my code 1.
#!/usr/local/bin/perl
#!/usr/bin/perl
use strict ;
use warnings ;
use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
my $input = "par_disp_fabric.all_max_lowvcc_qor.rpt.gz";
my $output = "par_disp_fabric.all_max_lowvcc_qor.txt";
gunzip $input => $output
or die "gunzip failed: $GunzipError\n";
open (FILE, '<',"$output") or die "Cannot open $output\n";
while (<FILE>) {
my $line = $_;
chomp ($line);
if ($line=~ m/^\s+Timing Path Group \'(\S+)\'/) {
$line = $1;
print ("$1\n");
}
}
close (FILE);
This is my code 2.
my $input = "par_disp_fabric.all_max_lowvcc_qor.rpt.gz";
my $output = "par_disp_fabric.all_max_lowvcc_qor.txt";
gunzip $input => $output
or die "gunzip failed: $GunzipError\n";
open (FILE, '<',"$output") or die "Cannot open $output\n";
while (<FILE>) {
my $line = $_;
chomp ($line);
if ($line=~ m/^\s+Levels of Logic:\s+(\S+)/) {
$line = $1;
print ("$1\n");
}
}
close (FILE);
this is my output for code 1 which contain 26 line of data:
**async_default**
**clock_gating_default**
Ddia_link_clk
Ddib_link_clk
Ddic_link_clk
Ddid_link_clk
FEEDTHROUGH
INPUTS
Lclk
OUTPUTS
VISA_HIP_visa_tcss_2000
ckpll_npk_npkclk
clstr_fscan_scanclk_pulsegen
clstr_fscan_scanclk_pulsegen_notdiv
clstr_fscan_scanclk_wavegen
idvfreqA
idvfreqB
psf5_primclk
sb_nondet4tclk
sb_nondetl2tclk
sb_nondett2lclk
sbclk_nondet
sbclk_sa_det
stfclk_scan
tap4tclk
tapclk
The output code 1 also has same number of line.
paste is useful for this: assuming your shell is bash, then using process substitutions
paste <(perl script1.pl) <(perl script2.pl)
That emits columns separated by a tab character. For prettier output, you can pipe the output of paste to column
paste <(perl script1.pl) <(perl script2.pl) | column -t -s $'\t'
And with this, you con't need to try and "merge" your perl programs.
To combine the two scripts and to output two items of data on the same line, you need to hold on until the end of the file (or until you have both data items) and then output them at once. So you need to combine both loops into one:
#!/usr/bin/perl
use strict ;
use warnings ;
use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
my $input = "par_disp_fabric.all_max_lowvcc_qor.rpt.gz";
my $output = "par_disp_fabric.all_max_lowvcc_qor.txt";
gunzip $input => $output
or die "gunzip failed: $GunzipError\n";
open (FILE, '<',"$output") or die "Cannot open $output\n";
my( $levels, $timing );
while (<FILE>) {
my $line = $_;
chomp ($line);
if ($line=~ m/^\s+Levels of Logic:\s+(\S+)/) {
$levels = $1;
}
if ($line=~ m/^\s+Timing Path Group \'(\S+)\'/) {
$timing = $1;
}
}
print "$levels, $timing\n";
close (FILE);
You still haven't given us one vital piece of information - what does the input data looks like. Most importantly, are the two pieces of information you're looking for on the same line?
[Update: Looking more closely at your regexes, I see it's possible for both pieces of information to be on the same line - as they are both supposed to be the first item on the line. It would be helpful if you were clearer about that in your question.]
I think this will do the right thing, no matter what the answer to your question is. I've also added the improvements I suggested in my answer to your previous question:
#!/usr/bin/perl
use strict ;
use warnings ;
use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
my $zipped = "par_disp_fabric.all_max_lowvcc_qor.rpt.gz";
my $unzipped = "par_disp_fabric.all_max_lowvcc_qor.txt";
gunzip $zipped => $unzipped
or die "gunzip failed: $GunzipError\n";
open (my $fh, '<', $unzipped) or die "Cannot open '$unzipped': $!\n";
my ($levels, $timing);
while (<$fh>) {
chomp;
if (m/^\s+Levels of Logic:\s+(\S+)/) {
$levels = $1;
}
if (m/^\s+Timing Path Group \'(\S+)\'/) {
$timing = $1;
}
# If we have both values, then print them out and
# set the variables to 'undef' for the next iteration
if ($levels and $timing) {
print "$levels, $timing\n";
undef $levels;
undef $timing;
}
}
close ($fh);

perl write variables to a file

Here's my code to parse a configuration file, write the retrieved data to another file and send it to a MySQL database.
The database connection and writing data to a table works fine, however I can't get it to write data to the mentioned file mongoData.txt.
I'm quite new to Perl, so any help will be highly appreciated.
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
my $line;
# Retrieving data
open( my $FILE, "<", "/etc/mongod.conf" )
or die "Cannot find file! : $!\n";
while ( $line = <$FILE> ) {
chomp($line);
my ( $KEY, $VALUE ) = split /\:/, $line;
# Ignoring commented lines
$_ = $line;
unless ( $_ = ~/^#/ ) {
# Write to file
open my $FILE2, ">", "/home/sierra/Documents/mongoData.txt"
or die "Cannot create file $!\n";
print $FILE2 "$KEY", "$VALUE\n";
}
# Connection to SQL database
my $db = DBI->connect(( "dbi:mysql:dbname=mongodconf;
host = localhost;", "root", "sqladmin"
)) or die "can't connect to mysql";
# Inserting into database
$db->do("insert into data values ('$KEY', '$VALUE')")
or die "query error\n";
}
close($FILE);
Every time you open a file for output, you create a new file and delete any pre-existing file with the same name. That means you're going to be left with only the last line you wrote to the file
Here are some more pointers
Variable identifiers should in general be all in digits, lower case letters, and underscores. Capital letters are reserved for global identifiers such as package names
If you are running a version of Perl later than v5.14 then you can use autodie which checks all IO operations for you and removes the need to test the return status by hand
If you use a die string that has no newline at the end, then Perl will add information about the source file name and line number where it occurred, which can be useful for debugging
It is unnecessary to name your loop control variables. Programs can be made much more concise and readable by using Perl's pronoun variable $_ which is the default for many built-in operators
It is wasteful to reconnect to your database every time you need to make changes. You should connect once at the top of your program and use that static connection throughout your code
You should use placeholders when passing parameter expressions to an SQL operation. It can be dangerous, and that way DBI will quote them correctly for you
There is no need to close input files explicitly. Everything will be closed automatically at the end of the program. But if you are worried about the integrity of your output data, you may want to do an explicit close on output file handles so that you can check that they succeeded
Here's what I would write. Rather than testing whether each line of the input begins with a hash, it removes everything from the first hash character onwards and then checks to see if there are any non-blank characters in what remains. That allows for trailing comments in the data
#!/usr/bin/perl
use strict;
use warnings 'all';
use autodie;
use DBI;
my ($input, $output, $dsn) = qw{
/etc/mongod.conf
/home/sierra/Documents/mongoData.txt
dbi:mysql:dbname=mongodconf;host=localhost;
};
open my $fh, '<', $input;
open my $out_fh, '>', $output;
my $dbh = DBI->connect($dsn, qw/ root sqladmin /)
or die "Can't connect to MySQL: $DBI::errstr";
while ( <$fh> ) {
chomp;
s/#.*//;
next unless /\S/;
my ( $key, $val ) = split /\:/;
print $out_fh "$key $val\n";
$dbh->do('insert into data values (?, ?)', $key, $val);
}
close $out_fh or die $!;
$dbh->disconnect or warn $dbh->errstr;
You need to append the text into the creating new file mongoData.txt
while ($line=<$FILE>)
{
chomp ($line);
my ($KEY, $VALUE) = split /\:/,$line;
# Ignoring commented lines
$_ = $line;
unless ($_ = ~/^#/)
{
open my $FILE2, ">>", "/home/sierra/Documents/mongoData.txt" or die "Cannot create file $!\n";
print $FILE2 "$KEY","$VALUE\n";
}
}
close($FILE2);
or else
Create the text file once before your nesting the while loop
open my $FILE2, ">", "/home/sierra/Documents/mongoData.txt"
or die "Cannot create file $!\n";
while ($line=<$FILE>)
{
chomp ($line);
my ($KEY, $VALUE) = split /\:/,$line;
# Ignoring commented lines
$_ = $line;
unless ($_ = ~/^#/)
{
print $FILE2 "$KEY","$VALUE\n";
}
}
close($FILE2);
May be this will help you.

My perl script isn't working, I have a feeling it's the grep command

I'm trying for search in the one file for instances of the
number and post if the other file contains those numbers
#!/usr/bin/perl
open(file, "textIds.txt"); #
#file = <file>; #file looking into
# close file; #
while(<>){
$temp = $_;
$temp =~ tr/|/\t/; #puts tab between name and id
#arrayTemp = split("\t", $temp);
#found=grep{/$arrayTemp[1]/} <file>;
if (defined $found[0]){
#if (grep{/$arrayTemp[1]/} <file>){
print $_;
}
#found=();
}
print "\n";
close file;
#the input file lines have the format of
#John|7791 154
#Smith|5432 290
#Conor|6590 897
#And in the file the format is
#5432
#7791
#6590
#23140
There are some issues in your script.
Always include use strict; and use warnings;.
This would have told you about odd things in your script in advance.
Never use barewords as filehandles as they are global identifiers. Use three-parameter-open
instead: open( my $fh, '<', 'testIds.txt');
use autodie; or check whether the opening worked.
You read and store testIds.txt into the array #file but later on (in your grep) you are
again trying to read from that file(handle) (with <file>). As #PaulL said, this will always
give undef (false) because the file was already read.
Replacing | with tabs and then splitting at tabs is not neccessary. You can split at the
tabs and pipes at the same time as well (assuming "John|7791 154" is really "John|7791\t154").
Your talking about "input file" and "in file" without exactly telling which is which.
I assume your "textIds.txt" is the one with only the numbers and the other input file is the
one read from STDIN (with the |'s in it).
With this in mind your script could be written as:
#!/usr/bin/perl
use strict;
use warnings;
# Open 'textIds.txt' and slurp it into the array #file:
open( my $fh, '<', 'textIds.txt') or die "cannot open file: $!\n";
my #file = <$fh>;
close($fh);
# iterate over STDIN and compare with lines from 'textIds.txt':
while( my $line = <>) {
# split "John|7791\t154" into ("John", "7791", "154"):
my ($name, $number1, $number2) = split(/\||\t/, $line);
# compare $number1 to each member of #file and print if found:
if ( grep( /$number1/, #file) ) {
print $line;
}
}

How do I find the line a word is on when the user enters text in Perl?

I have a simple text file that includes all 50 states. I want the user to enter a word and have the program return the line the specific state is on in the file or otherwise display a "word not found" message. I do not know how to use find. Can someone assist with this? This is what I have so far.
#!/bin/perl -w
open(FILENAME,"<WordList.txt"); #opens WordList.txt
my(#list) = <FILENAME>; #read file into list
my($state); #create private "state" variable
print "Enter a US state to search for: \n"; #Print statement
$line = <STDIN>; #use of STDIN to read input from user
close (FILENAME);
An alternative solution that reads only the parts of the file until a result is found, or the file is exhausted:
use strict;
use warnings;
print "Enter a US state to search for: \n";
my $line = <STDIN>;
chomp($line);
# open file with 3 argument open (safer)
open my $fh, '<', 'WordList.txt'
or die "Unable to open 'WordList.txt' for reading: $!";
# read the file until result is found or the file is exhausted
my $found = 0;
while ( my $row = <$fh> ) {
chomp($row);
next unless $row eq $line;
# $. is a special variable representing the line number
# of the currently(most recently) accessed filehandle
print "Found '$line' on line# $.\n";
$found = 1; # indicate that you found a result
last; # stop searching
}
close($fh);
unless ( $found ) {
print "'$line' was not found\n";
}
General notes:
always use strict; and use warnings; they will save you from a wide range of bugs
3 argument open is generally preferred, as well as the or die ... statement. If you are unable to open the file, reading from the filehandle will fail
$. documentation can be found in perldoc perlvar
Tool for the job is grep.
chomp ( $line ); #remove linefeeds
print "$line is in list\n" if grep { m/^\Q$line\E$/g } #list;
You could also transform your #list into a hash, and test that, using map:
my %cities = map { $_ => 1 } #list;
if ( $cities{$line} ) { print "$line is in list\n";}
Note - the above, because of the presence of ^ and $ is an exact match (and case sensitive). You can easily adjust it to support fuzzier scenarios.

Perl - Read multiple files and read line by line of the text file

I am trying to read multiple .txt files in a folder. Each file should be read line by line, however, I failed to read multiple .txt files by using glob. Any advice on my code?
my %data;
#FILES = glob("*.txt");
$EmailMsg .= "EG. Folder(week) = Folder(CW01) --CW01 = Week 1 -- Number is week\n ";
$EmailMsg .= "=======================================================================================================\n";
# Try to Loop multiple files here
foreach my $file (#FILES) {
local $/ = undef;
open my $fh, '<', $file;
$data{$file} = <$fh>;
# Read the file one line at a time.
while (my $line = <$fh>) {
chomp $line;
$line =~ s/^\s+//;
$line =~ s/\s+$//;
my ($name, $date, $week) = split /\:/, $line;
if ($name eq "NoneFolder") {
$EmailMsg .= "Folder ($week) - No Folder created on the FTP! Failed to open folder!\n";
}
if ($name eq "EmptyFiles") {
$EmailMsg .= "Folder ($week) - No Files insides the folder! Failed download files!\n";
}
}
}
$EmailMsg .= "=======================================================================================================\n";
$EmailMsg .= "Please note that if you receive this email means that the script is running fine just that no folder is created or no files inside the folder for the week on the FTP.\n";
# close the file.
#close <$fh>;
Currently output:
EG. Folder(week) = Folder(CW01) --CW01 = Week 1 -- Number is week
=======================================================================================================
=======================================================================================================
Please note that if you receive this email means that the script is running fine just that no folder is created or no files inside the folder for the week on the FTP.
It failed to get any .txt files.
You are trying to read each file twice: firstly into the hash %data and then again line by line.
Once you have reached end of file, you have to either reopen the file or use seek to move the read pointer back to the beginning.
You also need to set $/ back to its original value, otherwise your loop will read the entire file instead of one line at a time.
It's not clear whether you really need the second copy of the file data in the hash, but you can avoid having to reset $/ by putting the change within a block, like this
open my $fh, '<', $file;
$data{$file} = do {
local $/ = undef;
<$fh>;
};
and then reset the file pointer to the start again before the while loop.
seek $fh, 0, 0;
#!/usr/bin/perl
use strict;
use warnings FATAL => 'all';
my #files=('Read a file.pl','Read a single text file.pl','Read only one
file.pl','Read the file using while.pl','Reading the file.pl');
foreach my $i(#files) {
open(FH, "<$i");
{
while (my $row = <FH>) {
chomp $row;
print "$row\n";
}
}
}
The file globbing works for me. You might want to specify scope for your #FILES variable and check that there actually are files matching the path you have specified,
#!/bin/env perl
use strict;
use warnings;
## glob on all files in home directory
## see: http://perldoc.perl.org/File/Glob.html
use File::Glob ':globally';
my #configs = <~myname/project/etc/*.cfg>;
foreach my $fn (#configs) {
print "file $fn\n";
}
your code,
my %data;
#here are some .c files,
my #FILES = glob("../*.c");
foreach my $fn (#FILES) {
print "file $fn\n";
}
exit;
This way catches more garbage for about the same amount of code.
my $PATH = shift #ARGV ;
chomp $PATH ;
opendir(TXTFILE,$PATH) || die ("failed to opendir: $PATH") ;
my #file = readdir TXTFILE ;
closedir(TXTFILE) ;
foreach(#file) { #
next unless ($_ =~ /\.txt$/i) ; # Only get .txt files
$PATH =~ s/\/$//g ; $PATH =~ s/$/\// ; # Uniform trailing slash
my $thisfile = $PATH . $_ ; # now a fully qualified filename
unless (open(THISFILE,$thisfile)) { # Notify on busted files.
warn ("$thisfile failed to open") ;
next ;
}
while(<THISFILE>) {
# etc. etc.
}
close(THISFILE) ;
}