I have written a Perl script to parse through a file, scrub it, and put it in a new file. Was using test data that I was originally given to work with, but now I've gotten all the actual data and it turns out there are a good deal of records I will NOT want in the newly scrubbed file (mainly because too many of the fields in those records are empty).
So I now need to check if a particular field in a record is empty and if so, write it out to an "error" file and not write it out to the scrubbed data file. Below is my script (and before people bring it up, I do not have the Text::CSV module nor will I ever have it available)
NOTE - until I tried putting the IF/ELSE statement in there, the code was working with the data I had prior to being given the actual data with these problem records.
#!/usr/bin/perl/
use strict;
use warnings;
use Data::Dumper;
use Time::Piece;
my $filename = 'uncleanData.csv';
open my $FH, $filename
or die "Could not read from $filename <$!>, program halting.";
# Read the header line.
chomp(my $line = <$FH>);
my #fields = split(/,/, $line);
print Dumper(#fields), $/;
my #data;
# Read the lines one by one.
while($line = <$FH>) {
chomp($line);
Here is the new IF statement I put in with the code below the ELSE having not changed from my prior working script -
# Check if the storeNbr field is empty. If so, write record to error file.
if (!length $fields[28]) {
open ( my $ERR_FH, '>', "errorFiles.csv" ) or die $!;
print $ERR_FH join(',', #$_), $/ for #data;
close $ERR_FH;
}
else
{
# Scrub data of characters that cause scripting problems down the line.
$line =~ s/[\'\\]/ /g;
# split the fields, concatenate fields 28-30, and add the
# concatenated field to the beginning of each line in the file
my #fields = split(/,/, $line);
unshift #fields, join '_', #fields[28..30];
# Format the DATE fields for MySQL
$_ = join '-', (split /\//)[2,0,1] for #fields[10,14,24,26];
# Scrub colons from the data
$line =~ s/:/ /g;
# If Spectro_Model is "UNKNOWN", change
if($fields[22] eq "UNKNOWN"){
$_ = 'UNKNOW' for $fields[22];
}
# If tran_date is blank, insert 0000-00-00
if(!length $fields[10]){
$_ = '0000-00-00' for $fields[10];
}
# If init_tran_date is blank, insert 0000-00-00
if(!length $fields[14]){
$_ = '0000-00-00' for $fields[14];
}
# If update_tran_date is blank, insert 0000-00-00
if(!length $fields[24]){
$_ = '0000-00-00' for $fields[24];
}
# If cancel_date is blank, insert 0000-00-00
if(!length $fields[26]){
$_ = '0000-00-00' for $fields[26];
}
# Format the PROD_NBR field by deleting any leading zeros before decimals.
$fields[12] =~ s/^\s*0\././;
# put the records back
push #data, \#fields;
}
}
close $FH;
print "Unsorted:\n", Dumper(#data); #, $/;
#Sort the clean files on Primary Key, initTranDate, updateTranDate, and updateTranTime
#data = sort {
$a->[0] cmp $b->[0] ||
$a->[14] cmp $b->[14] ||
$a->[26] cmp $b->[26] ||
$a->[27] cmp $b-> [27]
} #data;
#open my $OFH, '>', '/swpkg/shared/batch_processing/mistints/parsedMistints.csv';
open my $OFH, '>', '/swpkg/shared/batch_processing/mistints/cleaned1502.csv';
print $OFH join(',', #$_), $/ for #data;
close $OFH;
exit;
I'm guessing my problem is where I am putting the closing brace } for the ELSE part of the statement. Here are some sample records from the file with the last file being one of the "problem" records -
650096571,1,1,used as store paint,14,IFC 8012NP,Standalone-9,3596,56,1/31/2015,80813,A97W01251,,1/16/2015,0.25,0.25,,SW,CUSTOM MATCH,TRUE,O,xts,,,,,,,1568,61006,1,FALSE
650368376,1,3,Tinted Wrong Color,16,IFC 8012NP,01DX8015206,,6,1/31/2015,160720,A87W01151,MATCH,1/31/2015,1,1,ENG,CUST,CUSTOM MATCH,TRUE,O,Ci52,,,,,,,1584,137252,1,FALSE
650175433,3,1,not tinted - e.w.,16,COROB MODULA HF,Standalone-7,,2,1/31/2015,95555,B20W02651,,1/29/2015,3,3,,COMP,CUSTOM MATCH,TRUE,P,xts,,,,,,,1627,68092,5,FALSE
650187016,2,1,checked out under cash ,,,,,,,,,,,,,,,,,,,,,,,,,,,,
When I run this script, it's still processing the "error records" and throwing up all kinds of "unitialized value" warnings.
Text::CSV is useful if you need to handle quotes or embedded linefeeds. Text::ParseWords can do as a substitute instead if you need that capability.
But as long as you don't have quoting to worry about, split works just fine.
You can do something like:
#!/usr/bin/env perl
use strict;
use warnings;
open ( my $normal_fh, '>', "output.txt" ) or die $!;
open ( my $err_fh, '>', "errors.txt" ) or die $!;
while ( <> ) {
if ( ( split /,/ ) [27] =~ /\w/ ) {
select $normal_fh;
}
else {
select $err_fh;
}
print;
}
Related
I am trying to convert a string into an array based on space delimiter.
My input file looks like this:
>Reference
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnctcACCATGGTGTCGACTC
TTCTATGGAAACAGCGTGGATGGCGTCTCCAGGCGATCTGACGGTTCACTAAACGAGCTC
Ignoring the line starting with >, the length of rest of the string is 360.
I am trying to convert this into an array.
Here's my code so far:
#!/usr/bin/perl
use strict;
use warnings;
#### To to change bases with less than 10X coverage to N #####
#### Take depth file and consensus fasta file as input arguments ####
my ($in2) = #ARGV;
my $args = $#ARGV + 1;
if ( $args != 1 ) {
print "Error!!! Insufficient Number of Argumrnts\n";
print "Usage: $0 <consensus fasta file> \n";
}
#### Open a filehandle to read in consensus fasta file ####
my $FH2;
my $line;
my #consensus;
my $char;
open($FH2, '<', $in2) || die "Could not open file $in2\n";
while ( <$FH2> ) {
$line = $_;
chomp $line;
next if $line =~ />/; # skip header line
$line =~ s/\s+//g;
my $len = length($line);
print "$len\n";
#print "$line";
#consensus = split(// , $line);
print "$#consensus\n";
#print "#consensus\n";
#for $char (0 .. $#consensus){
# print "$char: $consensus[$char]\n";
# }
}
The problem is the $len variable returns a value of 60 instead of 360 and $#consensus returns a value of 59 instead of 360 which is the length of the string.
I have removed the whitespace after each line with code $line =~ s/\s+//g;but it still is not working.
It looks like your code is essentially working. It's just your checking logic that makes no sense. I'd do the following:
use strict;
use warnings;
if (#ARGV != 1) {
print STDERR "Usage: $0 <consensus fasta file>\n";
exit 1;
}
open my $fh, '<', $ARGV[0] or die "$0: cannot open $ARGV[0]: $!\n";
my #consensus;
while (my $line = readline $fh) {
next if $line =~ /^>/;
$line =~ s/\s+//g;
push #consensus, split //, $line;
}
print "N = ", scalar #consensus, "\n";
Main things to note:
Error messages should go to STDERR, not STDOUT.
If an error occurs, the program should exit with an error code, not keep running.
Error messages should include the name of the program and the reason for the error.
chomp is redundant if you're going to remove all whitespace anyway.
As you're processing the input line by line, you can just keep pushing elements to the end of #consensus. At the end of the loop it'll have accumulated all characters across all lines.
Examining #consensus within the loop makes little sense as it hasn't finished building yet. Only after the loop do we have all characters we're interested in.
Here's my code to parse a configuration file, write the retrieved data to another file and send it to a MySQL database.
The database connection and writing data to a table works fine, however I can't get it to write data to the mentioned file mongoData.txt.
I'm quite new to Perl, so any help will be highly appreciated.
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
my $line;
# Retrieving data
open( my $FILE, "<", "/etc/mongod.conf" )
or die "Cannot find file! : $!\n";
while ( $line = <$FILE> ) {
chomp($line);
my ( $KEY, $VALUE ) = split /\:/, $line;
# Ignoring commented lines
$_ = $line;
unless ( $_ = ~/^#/ ) {
# Write to file
open my $FILE2, ">", "/home/sierra/Documents/mongoData.txt"
or die "Cannot create file $!\n";
print $FILE2 "$KEY", "$VALUE\n";
}
# Connection to SQL database
my $db = DBI->connect(( "dbi:mysql:dbname=mongodconf;
host = localhost;", "root", "sqladmin"
)) or die "can't connect to mysql";
# Inserting into database
$db->do("insert into data values ('$KEY', '$VALUE')")
or die "query error\n";
}
close($FILE);
Every time you open a file for output, you create a new file and delete any pre-existing file with the same name. That means you're going to be left with only the last line you wrote to the file
Here are some more pointers
Variable identifiers should in general be all in digits, lower case letters, and underscores. Capital letters are reserved for global identifiers such as package names
If you are running a version of Perl later than v5.14 then you can use autodie which checks all IO operations for you and removes the need to test the return status by hand
If you use a die string that has no newline at the end, then Perl will add information about the source file name and line number where it occurred, which can be useful for debugging
It is unnecessary to name your loop control variables. Programs can be made much more concise and readable by using Perl's pronoun variable $_ which is the default for many built-in operators
It is wasteful to reconnect to your database every time you need to make changes. You should connect once at the top of your program and use that static connection throughout your code
You should use placeholders when passing parameter expressions to an SQL operation. It can be dangerous, and that way DBI will quote them correctly for you
There is no need to close input files explicitly. Everything will be closed automatically at the end of the program. But if you are worried about the integrity of your output data, you may want to do an explicit close on output file handles so that you can check that they succeeded
Here's what I would write. Rather than testing whether each line of the input begins with a hash, it removes everything from the first hash character onwards and then checks to see if there are any non-blank characters in what remains. That allows for trailing comments in the data
#!/usr/bin/perl
use strict;
use warnings 'all';
use autodie;
use DBI;
my ($input, $output, $dsn) = qw{
/etc/mongod.conf
/home/sierra/Documents/mongoData.txt
dbi:mysql:dbname=mongodconf;host=localhost;
};
open my $fh, '<', $input;
open my $out_fh, '>', $output;
my $dbh = DBI->connect($dsn, qw/ root sqladmin /)
or die "Can't connect to MySQL: $DBI::errstr";
while ( <$fh> ) {
chomp;
s/#.*//;
next unless /\S/;
my ( $key, $val ) = split /\:/;
print $out_fh "$key $val\n";
$dbh->do('insert into data values (?, ?)', $key, $val);
}
close $out_fh or die $!;
$dbh->disconnect or warn $dbh->errstr;
You need to append the text into the creating new file mongoData.txt
while ($line=<$FILE>)
{
chomp ($line);
my ($KEY, $VALUE) = split /\:/,$line;
# Ignoring commented lines
$_ = $line;
unless ($_ = ~/^#/)
{
open my $FILE2, ">>", "/home/sierra/Documents/mongoData.txt" or die "Cannot create file $!\n";
print $FILE2 "$KEY","$VALUE\n";
}
}
close($FILE2);
or else
Create the text file once before your nesting the while loop
open my $FILE2, ">", "/home/sierra/Documents/mongoData.txt"
or die "Cannot create file $!\n";
while ($line=<$FILE>)
{
chomp ($line);
my ($KEY, $VALUE) = split /\:/,$line;
# Ignoring commented lines
$_ = $line;
unless ($_ = ~/^#/)
{
print $FILE2 "$KEY","$VALUE\n";
}
}
close($FILE2);
May be this will help you.
Okay - I'm going to post my entire script since I get chastised when I don't do it - even though, last time I did that I got chastised for posting the whole script. I simply need to know if the one line I originally asked about would work. ENTIRE SCRIPT (which was working just fine until the other dept gave me their data entirely differently than what we were originally told it would be) TO FOLLOW AT THE END
I'm parsing through and scrubbing a CSV file to make it ready to be loaded in a MySQL table. It is loaded through the table via someone else's "batch Java program" and if any field is empty the batch file stops with an error.
I've been told to just put in a blank space whenever there's an empty field in any record. Would something as simple as this work?
if ( ! length $fields[2] ) {
$_ = ' ' for $fields[2];
}
And would there be a way to check either various multiple fields at once? Or what might be better would be to check ALL the fields (this is after the record has been split) as the last thing I do just before writing the record back out to the CSV file.
Here's the entire script. Please don't tell me how what I'm doing within the already working script is not how you would do it. -
#!/usr/bin/perl/
use strict;
use warnings;
use Data::Dumper;
use Time::Piece;
my $filename = 'mistints_1505_comma.csv';
#my $filename = 'test.csv';
# Open input file
open my $FH, $filename
or die "Could not read from $filename <$!>, program halting.";
# Open error handling file
open ( my $ERR_FH, '>', "errorFiles1505.csv" ) or die $!;
# Read the header line of the input file and print to screen.
chomp(my $line = <$FH>);
my #fields = split(/,/, $line);
print Dumper(#fields), $/;
my #data;
# Read the lines one by one.
while($line = <$FH>) {
chomp($line);
# Scrub data of characters that cause scripting problems down the line.
$line =~ s/[\'\\]/ /g;
# split the fields of each record
my #fields = split(/,/, $line);
# Check if the storeNbr field is empty. If so, write record to error file.
if (!length $fields[28]) {
chomp (#fields);
my $str = join ',', #fields;
print $ERR_FH "$str\n";
}
else
{
# Concatenate the first three fields and add to the beginning of each record
unshift #fields, join '_', #fields[28..30];
# Format the DATE fields for MySQL
$_ = join '-', (split /\//)[2,0,1] for #fields[10,14,24,26];
# Scrub colons from the data
$line =~ s/:/ /g;
# If Spectro_Model is "UNKNOWN", change
if($fields[22] eq "UNKNOWN"){
$_ = 'UNKNOW' for $fields[22];
}
# If tran_date is blank, insert 0000-00-00
if(!length $fields[10]){
$_ = '0000-00-00' for $fields[10];
}
# If init_tran_date is blank, insert 0000-00-00
if(!length $fields[14]){
$_ = '0000-00-00' for $fields[14];
}
# If update_tran_date is blank, insert 0000-00-00
if(!length $fields[24]){
$_ = '0000-00-00' for $fields[24];
}
# If cancel_date is blank, insert 0000-00-00
if(!length $fields[26]){
$_ = '0000-00-00' for $fields[26];
}
# Format the PROD_NBR field by deleting any leading zeros before decimals.
$fields[12] =~ s/^\s*0\././;
# put the records back
push #data, \#fields;
}
}
close $FH;
close $ERR_FH;
print "Unsorted:\n", Dumper(#data); #, $/;
#Sort the clean files on Primary Key, initTranDate, updateTranDate, and updateTranTime
#data = sort {
$a->[0] cmp $b->[0] ||
$a->[14] cmp $b->[14] ||
$a->[26] cmp $b->[26] ||
$a->[27] cmp $b-> [27]
} #data;
#open my $OFH, '>', '/swpkg/shared/batch_processing/mistints/parsedMistints.csv';
open my $OFH, '>', '/swpkg/shared/batch_processing/mistints/cleaned1505.csv';
print $OFH join(',', #$_), $/ for #data;
close $OFH;
exit;
As far as I can tell you have split a record on commas ,, and you want to alter all fields that are empty strings to contain a single space
I would write this
use strict;
use warnings 'all';
my $record = 'a,b,c,,e,,g,,i,,k,,m,n,o,p,q,r,s,t';
my #fields = map { $_ eq "" ? ' ' : $_ } split /,/, $record;
use Data::Dump;
dd \#fields;
output
[ "a", "b", "c", " ", "e", " ", "g", " ", "i", " ", "k", " ", "m" .. "t" ]
Alternatively, if you have some fields that need to be set to something different if they are empty, you can set up an array of defaults
That would look like this. All of the #defaults array is set to spaces except for fields 10, 11 and 12, which are 0000-00-00. These are picked up after the record is split
use strict;
use warnings 'all';
my #defaults = (' ') x 20;
$defaults[$_] = '0000-00-00' for 9, 10, 11;
my $record = 'a,b,c,,e,,g,,i,,k,,m,n,o,p,q,r,s,t';
my #fields = split /,/, $record;
for my $i ( 0 .. $#fields ) {
$fields[$i] = $defaults[$i] if $fields[$i] eq '';
}
use Data::Dump;
dd \#fields;
output
[ "a", "b", "c", " ", "e", " ", "g", " ", "i", "0000-00-00", "k", "0000-00-00", "m" .. "t" ]
Having seen your full program, I recommend something like this. If you had shown a sample of your input data then I could have used a hash to refer to column names instead of numbers, making it much more readable
#!/usr/bin/perl/
use strict;
use warnings 'all';
use Data::Dumper;
use Time::Piece;
my $filename = 'mistints_1505_comma.csv';
#my $filename = 'test.csv';
open my $FH, $filename
or die "Could not read from $filename <$!>, program halting.";
open( my $ERR_FH, '>', "errorFiles1505.csv" ) or die $!;
chomp( my $line = <$FH> );
my #fields = split /,/, $line; #/
print Dumper( \#fields ), "\n";
my #data;
# Read the lines one by one.
while ( <$FH> ) {
chomp;
# Scrub data of characters that cause scripting problems down the line.
tr/'\\/ /; #'
my #fields = split /,/; #/
# Check if the storeNbr field is empty. If so, write record to error file.
if ( $fields[28] eq "" ) {
my $str = join ',', #fields;
print $ERR_FH "$str\n";
next;
}
# Concatenate the first three fields and add to the beginning of each record
unshift #fields, join '_', #fields[ 28 .. 30 ];
# Format the DATE fields for MySQL
$_ = join '-', ( split /\// )[ 2, 0, 1 ] for #fields[ 10, 14, 24, 26 ];
# Scrub colons from the data
tr/://d; #/
my $i = 0;
for ( #fields ) {
# If "Spectro_Model" is "UNKNOWN" then change to "UNKNOW"
if ( $i == 22 ) {
$_ = 'UNKNOW' if $_ eq 'UNKNOWN';
}
# If a date field is blank then insert 0000-00-00
elsif ( grep { $i == $_ } 10, 14, 24, 26 ) {
$_ = '0000-00-00' if $_ eq "";
}
# Format the PROD_NBR field by deleting any leading zeros before decimals.
elsif ( $i == 12 ) {
s/^\s*0\././;
}
# Change all remaining empty fields to a single space
else {
$_ = ' ' if $_ eq "";
}
++$i;
}
push #data, \#fields;
}
close $FH;
close $ERR_FH;
print "Unsorted:\n", Dumper(#data); #, $/;
#Sort the clean files on Primary Key, initTranDate, updateTranDate, and updateTranTime
#data = sort {
$a->[0] cmp $b->[0] or
$a->[14] cmp $b->[14] or
$a->[26] cmp $b->[26] or
$a->[27] cmp $b->[27]
} #data;
#open my $OFH, '>', '/swpkg/shared/batch_processing/mistints/parsedMistints.csv';
open my $OFH, '>', '/swpkg/shared/batch_processing/mistints/cleaned1505.csv' or die $!;
print $OFH join(',', #$_), $/ for #data;
close $OFH;
Well if you did it before splitting into $fields, you ought to be able to do something like
# assuming a CSV line is in $_
#pad null at start of line
s/^,/ ,/;
#pad nulls in the middle
s/,,/, ,/g;
#pad null at the end
s/,$/, /;
Don't try to roll out your own CSV parsing code. Use Text::CSV or Text::CSV::Slurp.
With Text::CSV you could do something like
$line = $csv->string(); # get the combined string
$status = $csv->parse($line); # parse a CSV string into fields
#columns = map {defined $_ ? $_ : " "} $csv->fields(); # get the parsed fields
Are you really sure you want to replace nulls with spaces? I'd say if the field is undefined it should be NULL in db.
So, I'm super new to perl... I'm on a windows box within a Corporate network. I am unable to download CPAN modules...
I have a CSV that is exported with multiple columns, the 2nd column contains 2 rows with text and the rest of the rows are IP addresses. I need to change the first 3 octets of the IP for all the rows excluding the text obviously. I then need to either save it to the same file, or create a new one... I also need all the other columns to remain in the document. I've looked and looked and everyone suggests modules which makes sense but I cannot get them. Here is my code:
if ( $cmd[0] eq "update" ) {
print "Old subnet ex: 10.0.0\n";
my $oldsubnet = <STDIN>;
chomp $oldsubnet;
print "New subnet ex: 10.0.0\n";
my $newsubnet = <STDIN>;
chomp $newsubnet;
my $file = "path\\file.csv";
open( my $fh, '<', $file ) or die "'$file' would not open $!";
while ( my $line = <$fh> ) {
chomp $line;
my #fields = split ",", $line;
my $string = $fields[1];
$string =~ s/$oldsubnet/$newsubnet/g;
my $ofile = "path\\test.csv";
open( my $ofh, '>>', $ofile ) or die "'$ofile' would not open $!";
print $ofh "$string\n";
}
}
So I just end up with a single column with the updated IP's with this code. How in the world do I get the rest of the CSV... Is there an easier way to just replace the IP's???? It seems like this would be easy but the CSV is making it hard. Any help would be appreciated. P.S. I am using strict and warnings :)
So just to clarify I am aware that #fields is being left out and thus not being printed... What I'm asking is how do I update only the data in the 2nd column and then put the whole document back into a new file?
You need to output more than just the changed field...
my #fields = split(/,/, $line);
$fields[1] =~ s/$oldsubnet/$newsubnet/g;
print $ofh join(',', #fields);
Also, don't open the output file in the loop to read the input file.
Without your source CSV, I can't say for sure, but would be thinking "use the Text::CSV module":
#!/usr/bin/env perl;
use strict;
use warnings;
use Text::CSV;
print "Old subnet ex: 10.0.0\n";
my $oldsubnet = <STDIN>;
chomp $oldsubnet;
print "New subnet ex: 10.0.0\n";
my $newsubnet = <STDIN>;
chomp $newsubnet;
my $file = "path\\file.csv";
my $ofile = "path\\test.csv";
open( my $input, '<', $file ) or die "'$file' would not open $!";
open( my $output, '>', $ofile ) or die "'$ofile' would not open $!";
my $csv = Text::CSV->new( { binary => 1, eol => "\n" } );
while ( my $row = $csv->getline() ) {
$row->[1] =~ s/$oldsubnet/$newsubnet/;
$csv->print( $output, $row );
}
It's not always the best answer, but where you're trying to read CSV, change a single field, and write CSV it works quite nicely.
I need some perl help in putting these (2) processes/code to work together. I was able to get them working individually to test, but I need help bringing them together especially with using the loop constructs. I'm not sure if I should go with foreach..anyways the code is below.
Also, any best practices would be great too as I'm learning this language. Thanks for your help.
Here's the process flow I am looking for:
read a directory
look for a particular file
use the file name to strip out some key information to create a newly processed file
process the input file
create the newly processed file for each input file read (if i read in 10, I create 10 new files)
Part 1:
my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
next if ($file =~ /^\.+$/);
#Get filename attributes
if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
print "$1\n";
print "$2\n";
print "$3\n";
}
print "$file\n";
}
Part 2:
use strict;
use Digest::MD5 qw(md5_hex);
#Create new file
open (NEWFILE, ">/backups/processed/foo$1.name.$2-foo_p$3.out") || die "cannot create file";
my $data = '';
my $line1 = <>;
chomp $line1;
my #heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ( "^A", "^E", "^D");
while (<>)
{
my $digest = md5_hex($data);
chomp;
my (#values) = split /,/;
my $extra = "__mykey__$sep1$digest$sep2" ;
$extra .= "$heading[$_]$sep1$values[$_]$sep2" for (0..scalar(#values));
$data .= "$extra$eorec";
print NEWFILE "$data";
}
#print $data;
close (NEWFILE);
You are using an old-style of Perl programming. I recommend you to use functions and CPAN modules (http://search.cpan.org). Perl pseudocode:
use Modern::Perl;
# use...
sub get_input_files {
# return an array of files (#)
}
sub extract_file_info {
# takes the file name and returs an array of values (filename attrs)
}
sub process_file {
# reads the input file, takes the previous attribs and build the output file
}
my #ifiles = get_input_files;
foreach my $ifile(#ifiles) {
my #attrs = extract_file_info($ifile);
process_file($ifile, #attrs);
}
Hope it helps
I've bashed your two code fragments together (making the second a sub that the first calls for each matching file) and, if I understood your description of the objective correctly, this should do what you want. Comments on style and syntax are inline:
#!/usr/bin/env perl
# - Never forget these!
use strict;
use warnings;
use Digest::MD5 qw(md5_hex);
my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
# Parens on postfix "if" are optional; I prefer to omit them
next if $file =~ /^\.+$/;
if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
process_file($file, $1, $2, $3);
}
print "$file\n";
}
sub process_file {
my ($orig_name, $foo_x, $name_x, $p_x) = #_;
my $new_name = "/backups/processed/foo$foo_x.name.$name_x-foo_p$p_x.out";
# - From your description of the task, it sounds like we actually want to
# read from the found file, not from <>, so opening it here to read
# - Better to use lexical ("my") filehandle and three-arg form of open
# - "or" has lower operator precedence than "||", so less chance of
# things being grouped in the wrong order (though either works here)
# - Including $! in the error will tell why the file open failed
open my $in_fh, '<', $orig_name or die "cannot read $orig_name: $!";
open(my $out_fh, '>', $new_name) or die "cannot create $new_name: $!";
my $data = '';
my $line1 = <$in_fh>;
chomp $line1;
my #heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ("^A", "^E", "^D");
while (<$in_fh>) {
chomp;
my $digest = md5_hex($data);
my (#values) = split /,/;
my $extra = "__mykey__$sep1$digest$sep2";
$extra .= "$heading[$_]$sep1$values[$_]$sep2"
for (0 .. scalar(#values));
# - Useless use of double quotes removed on next two lines
$data .= $extra . $eorec;
#print $out_fh $data;
}
# - Moved print to output file to here (where it will print the complete
# output all at once) rather than within the loop (where it will print
# all previous lines each time a new line is read in) to prevent
# duplicate output records. This could also be achieved by printing
# $extra inside the loop. Printing $data at the end will be slightly
# faster, but requires more memory; printing $extra within the loop and
# getting rid of $data entirely would require less memory, so that may
# be the better option if you find yourself needing to read huge input
# files.
print $out_fh $data;
# - $in_fh and $out_fh will be closed automatically when it goes out of
# scope at the end of the block/sub, so there's no real point to
# explicitly closing it unless you're going to check whether the close
# succeeded or failed (which can happen in odd cases usually involving
# full or failing disks when writing; I'm not aware of any way that
# closing a file open for reading can fail, so that's just being left
# implicit)
close $out_fh or die "Failed to close file: $!";
}
Disclaimer: perl -c reports that this code is syntactically valid, but it is otherwise untested.