How to read binary file in Perl - perl

I'm having an issue with writing a Perl script to read a binary file.
My code is as the following whereby the $file are files in binary format. I tried to search through the web and apply in my code, tried to print it out, but it seems it doesn't work well.
Currently it only prints the '&&&&&&&&&&&" and ""ppppppppppp", but what I really want is it can print out each of the $line, so that I can do some other post processing later. Also, I'm not quite sure what the $data is as I see it is part of the code from sample in article, stating suppose to be a scalar. I need somebody who can pin point me where the error goes wrong in my code. Below is what I did.
my $tmp = "$basedir/$key";
opendir (TEMP1, "$tmp");
my #dirs = readdir(TEMP1);
closedir(TEMP1);
foreach my $dirs (#dirs) {
next if ($dirs eq "." || $dirs eq "..");
print "---->$dirs\n";
my $d = "$basedir/$key/$dirs";
if (-d "$d") {
opendir (TEMP2, $d) || die $!;
my #files = readdir (TEMP2); # This should read binary files
closedir (TEMP2);
#my $buffer = "";
#opendir (FILE, $d) || die $!;
#binmode (FILE);
#my #files = readdir (FILE, $buffer, 169108570);
#closedir (FILE);
foreach my $file (#files) {
next if ($file eq "." || $file eq "..");
my $f = "$d/$file";
print "==>$file\n";
open FILE, $file || die $!;
binmode FILE;
foreach ($line = read (FILE, $data, 169108570)) {
print "&&&&&&&&&&&$line\n";
print "ppppppppppp$data\n";
}
close FILE;
}
}
}
I have altered my code so that it goes like as below. Now I can read the $data. Thanks J-16 SDiZ for pointing out that. I'm trying to push the info I got from the binary file to an array called "#array", thinkking to grep data from the array for string whichever match "p04" but fail. Can someone point out where is the error?
my $tmp = "$basedir/$key";
opendir (TEMP1, "$tmp");
my #dirs = readdir (TEMP1);
closedir (TEMP1);
foreach my $dirs (#dirs) {
next if ($dirs eq "." || $dirs eq "..");
print "---->$dirs\n";
my $d = "$basedir/$key/$dirs";
if (-d "$d") {
opendir (TEMP2, $d) || die $!;
my #files = readdir (TEMP2); #This should read binary files
closedir (TEMP2);
foreach my $file (#files) {
next if ($file eq "." || $file eq "..");
my $f = "$d/$file";
print "==>$file\n";
open FILE, $file || die $!;
binmode FILE;
foreach ($line = read (FILE, $data, 169108570)) {
print "&&&&&&&&&&&$line\n";
print "ppppppppppp$data\n";
push #array, $data;
}
close FILE;
}
}
}
foreach $item (#array) {
#print "==>$item<==\n"; # It prints out content of binary file without the ==> and <== if I uncomment this.. weird!
if ($item =~ /p04(.*)/) {
print "=>$item<===============\n"; # It prints "=><===============" according to the number of binary file I have. This is wrong that I aspect it to print the content of each binary file instead :(
next if ($item !~ /^w+/);
open (LOG, ">log") or die $!;
#print LOG $item;
close LOG;
}
}
Again, I changed my code as following, but it still doesn't work as it do not able to grep the "p04" correctly by checking on the "log" file. It did grep the whole file including binary like this "#^#^#^#^G^D^#^#^#^^#p04bbhi06^#^^#^#^#^#^#^#^#^#hh^R^#^#^#^^#^#^#p04lohhj09^#^#^#^^##" . What I'm aspecting is it do grep the anything with p04 only such as grepping p04bbhi06 and p04lohhj09. Here is how my code goes:-
foreach my $file (#files) {
next if ($file eq "." || $file eq "..");
my $f = "$d/$file";
print "==>$file\n";
open FILE, $f || die $!;
binmode FILE;
my #lines = <FILE>;
close FILE;
foreach $cell (#lines) {
if ($cell =~ /b12/) {
push #array, $cell;
}
}
}
#my #matches = grep /p04/, #lines;
#foreach $item (#matches) {
foreach $item (#array) {
#print "-->$item<--";
open (LOG, ">log") or die $!;
print LOG $item;
close LOG;
}

Use:
$line = read (FILE, $data, 169108570);
The data is in $data; and $line is the number of bytes read.
my $f = "$d/$file" ;
print "==>$file\n" ;
open FILE, $file || die $! ;
I guess the full path is in $f, but you are opening $file. (In my testing -- even $f is not the full path, but I guess you may have some other glue code...)
If you just want to walk all the files in a directory, try File::DirWalk or File::Find.

I am not sure if I understood you right.
If you need to read a binary file, you can do the same as for a text file:
open F, "/bin/bash";
my $file = do { local $/; <F> };
close F;
Under Windows you may need to add binmode F; under *nix it works without it.
If you need to find which lines in an array contains some word, you can use grep function:
my #matches = grep /something/, #array_to_grep;
You will get all matched lines in the new array #matches.
BTW: I don't think it's a good idea to read tons of binary files into memory at once. You can search them 1 by 1...
If you need to find where the match occurs you can use another standard function, index:
my $offset = index('myword', $file);

I'm not sure I'll be able to answer the OP question exactly, but here are some notes that may be related. (edit: this is the same approach as answer by #Dimanoid, but with more detail)
Say you have a file, which is a mix of ASCII data, and binary. Here is an example in a bash terminal:
$ echo -e "aa aa\x00\x0abb bb" | tee tester.txt
aa aa
bb bb
$ du -b tester.txt
13 tester.txt
$ hexdump -C tester.txt
00000000 61 61 20 61 61 00 0a 62 62 20 62 62 0a |aa aa..bb bb.|
0000000d
Note that byte 00 (specified as \x00) is a non-printable character, (and in C, it also means "end of a string") - thereby, its presence makes tester.txt a binary file. The file has size of 13 bytes as seen by du, because of the trailing \n added by the echo (as it can be seen from hexdump).
Now, let's see what happens when we try to read it with perl's <> diamond operator (see also What's the use of <> in perl?):
$ perl -e '
open IN, "<./tester.txt";
binmode(IN);
$data = <IN>; # does this slurp entire file in one go?
close(IN);
print "length is: " . length($data) . "\n";
print "data is: --$data--\n";
'
length is: 7
data is: --aa aa
--
Clearly, the entire file didn't get slurped - it broke at the line end \n (and not at the binary \x00). That is because the diamond filehandle <FH> operator is actually shortcut for readline (see Perl Cookbook: ChapterĀ 8, File Contents)
The same link tells that one should undef the input record separator, \$ (which by default is set to \n), in order to slurp the entire file. You may want to have this change be only local, which is why the braces and local are used instead of undef (see Perl Idioms Explained - my $string = do { local $/; };); so we have:
$ perl -e '
open IN, "<./tester.txt";
print "_$/_\n"; # check if $/ is \n
binmode(IN);
{
local $/; # undef $/; is global
$data = <IN>; # this should slurp one go now
};
print "_$/_\n"; # check again if $/ is \n
close(IN);
print "length is: " . length($data) . "\n";
print "data is: --$data--\n";
'
_
_
_
_
length is: 13
data is: --aa aa
bb bb
--
... and now we can see the file is slurped in its entirety.
Since binary data implies unprintable characters, you may want to inspect the actual contents of $data by printing via sprintf or pack/unpack instead.
Hope this helps someone,
Cheers!

Related

Search string with multiple words in the pattern

My program is trying to search a string from multiple files in a directory. The code searches for single patterns like perl but fails to search a long string like Status Code 1.
Can you please let me know how to search for strings with multiple words?
#!/usr/bin/perl
my #list = `find /home/ad -type f -mtime -1`;
# printf("Lsit is $list[1]\n");
foreach (#list) {
# print("Now is : $_");
open(FILE, $_);
$_ = <FILE>;
close(FILE);
unless ($_ =~ /perl/) { # works, but fails to find string "Status Code 1"
print "found\n";
my $filename = 'report.txt';
open(my $fh, '>>', $filename) or die "Could not open file '$filename' $!";
say $fh "My first report generated by perl";
close $fh;
} # end unless
} # end For
There are a number of problems with your code
You must always use strict and use warnings at the top of every Perl program. There is little point in delcaring anything with my without strict in place
The lines returned by the find command will have a newline at the end which must be removed before Perl can find the files
You should use lexical file handles (my $fh instead of FILE) and the three-parameter form of open as you do with your output file
$_ = <FILE> reads only the first line of the file into $_
unless ($_ =~ /perl/) is inverted logic, and there's no need to specify $_ as it is the default. You should write if ( /perl/ )
You can't use say unless you have use feature 'say' at the top of your program (or use 5.010, which adds all features available in Perl v5.10)
It is also best to avoid using shell commands as Perl is more than able to do anything that you can using command line utilities. In this case -f $file is a test that returns true if the file is a plain file, and -M $file returns the (floating point) number of days since the file's modification time
This is how I would write your program
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
for my $file ( glob '/home/ad/*' ) {
next unless -f $file and int(-M $file) == 1;
open my $fh, '<', $file or die $!;
while ( <$fh> ) {
if ( /perl/ ) {
print "found\n";
my $filename = 'report.txt';
open my $out_fh, '>>', $filename or die "Could not open file '$filename': $!";
say $fh "My first report generated by perl";
close $out_fh;
last;
}
}
}
it should have matched unless $_ contains text in different case.
try this.
unless($_ =~ /Status\s+Code\s+1/i) {
Change
unless ($_ =~ /perl/) {
to:
unless ($_ =~ /(Status Code 1)/) {
I am certain the above works, except it's case sensitive.
Since you question it, I rewrote your script to make more sense of what you're trying to accomplish and implement the above suggestion. Correct me if I am wrong, but you're trying to make a script which matches "Status Code 1" in a bunch of files where last modified within 1 day and print the filename to a text file.
Anyways, below is what I recommend:
#!/usr/bin/perl
use strict;
use warnings;
my $output_file = 'report.txt';
my #list = `find /home/ad -type f -mtime -1`;
foreach my $filename (#list) {
print "PROCESSING: $filename";
open (INCOMING, "<$filename") || die "FATAL: Could not open '$filename' $!";
foreach my $line (<INCOMING>) {
if ($line =~ /(Status Code 1)/) {
open( FILE, ">>$output_file") or die "FATAL: Could not open '$output_file' $!";
print FILE sprintf ("%s\n", $filename);
close(FILE) || die "FATAL: Could not CLOSE '$output_file' $!";
# Bail when we get the first match
last;
}
}
close(INCOMING) || die "FATAL: Could not close '$filename' $!";
}

In Perl, how can filter all log files in a directory, and extract interesting lines?

I'm trying to select only the .log files in my directory and then search in those files for the word "unbound" and print the entire line into a new output file with the same name as the log file (number###.log) but with a .txt extension. This is what I have so far:
#!/usr/bin/perl
use strict;
use warnings;
my $path = $ARGV[0];
my $outpath = $ARGV[1];
my #files;
my $files;
opendir(DIR,$path) or die "$!";
#files = grep { /\.log$/} readdir(DIR);
my #out;
my $out;
opendir(OUT,$outpath) or die "$!";
my $line;
foreach $files (#files) {
open (FILE, "$files");
my #line = <FILE>;
my $regex = Unbound;
open (OUT, ">>$out");
print grep {$line =~ /$regex/ } <>;
}
close OUT;
close FILE;
closedir(DIR);
closedir (OUT);
I'm a beginner, and I don't really know how to create a new text file with the acquired output.
Few things I'd suggest to improve this code:
declare your loop iterators within the loop. foreach my $file ( #files ) {
use 3 arg open: open ( my $input_fh, "<", $filename );
use glob rather than opendir then grep. foreach my $file ( <$path/*.txt> ) {
grep is good for extracting things into arrays. Your grep reads the whole file to print it, which isn't necessary. Doesn't matter much if the file is short though.
perltidy is great for reformatting code.
you're opening 'OUT' to a directory path (I think?) which isn't going to work.
$outpath isn't, it's a file. You need to do something different to output to different files. opendir isn't really valid to an output.
because you're using opendir that's actually giving you filenames - not full paths. So you might be in the wrong place to actually open the files. Prepending the path name, doing a chdir are possible solutions. But that's one of the reasons I like glob because it returns a path as well.
So with that in mind - how about:
#!/usr/bin/perl
use strict;
use warnings;
use File::Basename;
#Extract paths
my $input_path = $ARGV[0];
my $output_path = $ARGV[1];
#Error if paths are invalid.
unless (defined $input_path
and -d $input_path
and defined $output_path
and -d $output_path )
{
die "Usage: $0 <input_path> <output_path>\n";
}
foreach my $filename (<$input_path/*.log>) {
# extract the 'name' bit of the filename.
# be slightly careful with this - it's based
# on an assumption which isn't always true.
# File::Spec is a more powerful way of accomplishing this.
# but should grab 'number####' from /path/to/file/number####.log
my $output_file = basename ( $filename, '.log' );
#open input and output filehandles.
open( my $input_fh, "<", $filename ) or die $!;
open( my $output_fh, ">", "$output_path/$output_file.txt" ) or die $!;
print "Processing $filename -> $output_path/$output_file.txt\n";
#iterate input, extracting into $line
while ( my $line = <$input_fh> ) {
#check if $line matches your RE.
if ( $line =~ m/Unbound/ ) {
#write it to output.
print {$output_fh} $line;
}
}
#tidy up our filehandles. Although technically, they'll
#close automatically because they leave scope
close($output_fh);
close($input_fh);
}
Here is a script that takes advantage of Path::Tiny. Now, at this stage of your learning process, you are probably better off understanding #Sobrique's solution, but using modules such as Path::Tiny or Path::Class will make it easier to write these one off scripts more quickly, and correctly.
Also, I didn't really test this script, so watch out for bugs.
#!/usr/bin/env perl
use strict;
use warnings;
use Path::Tiny;
run(\#ARGV);
sub run {
my $argv = shift;
unless (#$argv == 2) {
die "Need source and destination paths\n";
}
my $it = path($argv->[0])->realpath->iterator({
recurse => 0,
follow_symlinks => 0,
});
my $outdir = path($argv->[1])->realpath;
while (my $path = $it->()) {
next unless -f $path;
next unless $path =~ /[.]log\z/;
my $logfh = $path->openr;
my $outfile = $outdir->child($path->basename('.log') . '.txt');
my $outfh;
while (my $line = <$logfh>) {
next unless $line =~ /Unbound/;
unless ($outfh) {
$outfh = $outfile->openw;
}
print $outfh $line;
}
close $outfh
or die "Cannot close output '$outfile': $!";
}
}
Notes
realpath will croak if the path provided does not exist.
Similarly for openr and openw.
I am reading input files line-by-line to keep the memory footprint of the program independent of the sizes of input files.
I do not open the output file until I know I have a match to print to.
When matching a file extension using a regular expression pattern, keep in mind that \n is a valid character in Unix file names, and the $ anchor will match it.

how to find the first occurrence of a string in all the files in a folder in perl

I'm trying to find the line of first occurrence of the string "victory" in each txt file in a folder. For each first "victory" in file I would like to save the number from that line to #num and the file name to #filename
Example: For the file a.txt that starts with the line: "lalala victory 123456" -> $num[$i]=123456 and $filename[$i]="a.txt"
ARGV holds all the file names. my problem is that I'm trying to go line by line and I don't know what I'm doing wrong.
one more thing - how can I get the last occurrence of "victory" in the last file??
use strict;
use warnings;
use File::Find;
my $dir = "D:/New folder";
find(sub { if (-f && /\.txt$/) { push #ARGV, $File::Find::name } }, $dir); $^I = ".bak";
my $argvv;
my $counter=0;
my $prev_arg=0;
my $line = 0;
my #filename=0;
my #num=0;
my $i = 0;
foreach $argvv (#ARGV)
{
#open $line, $argvv or die "Could not open file: $!";
my $line = IN
while (<$line>)
{
if (/victory/)
{
$line = s/[^0-9]//g;
$first_bit[$i] = $line;
$filename[$i]=$argvv;
$i++;
last;
}
}
close $line;
}
for ($i=0; $i<3; $i++)
{
print $filename[$i]." ".$num[$i]."\n";
}
Thank you very much! :)
Your example script has a number of minor problems. The following example should do what you want in a fairly clean manner:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
# Find the files we're interested in parsing
my #files = ();
my $dir = "D:/New folder";
find(sub { if (-f && /\.txt$/) { push #files, $File::Find::name } }, $dir);
# We'll store our results in a hash, rather than in 2 arrays as you did
my %foundItems = ();
foreach my $file (#files)
{
# Using a lexical file handle is the recommended way to open files
open my $in, '<', $file or die "Could not open $file: $!";
while (<$in>)
{
# Uncomment the next two lines to see what's being parsed
# chomp; # Not required, but helpful for the debug print below
# print "$_\n"; # Print out the line being parsed; for debugging
# Capture the number if we find the word 'victory'
# This assumes the number is immediately after the word; if that
# is not the case, it's up to you to modify the logic here
if (m/victory\s+(\d+)/)
{
$foundItems{$file} = $1; # Store the item
last;
}
}
close $in;
}
foreach my $file (sort keys %foundItems)
{
print "$file=> $foundItems{$file}\n";
}
the below searches for a string abc in all the files(file*.txt) and prints only the first line.
perl -lne 'BEGIN{$flag=1}if(/abc/ && $flag){print $_;$flag=0}if(eof){$flag=1}' file*.txt
tested:
> cat temp
abc 11
22
13
,,
abc 22
bb
cc
,,
ww
kk
ll
,,
> cat temp2
abc t goes into 1000
fileA1, act that abc specific place
> perl -lne 'BEGIN{$flag=1}if(/abc/ && $flag){print $_;$flag=0}if(eof){$flag=1}' temp temp2
abc 11
abc t goes into 1000
>

How to check for files that has two different extensions in Perl

I have a file reflog with the content as below. There will be items with same name but different extensions. I want to check that for each of the items (file1, file2 & file3 here as example), it needs to be exist in both extensions (.abc and .def). If both extensions exist, it will perform some regex and print out. Else it will just report out with the file name together with extension (ie, if only on of file1.abc or file1.def exists, it will be printed out).
reflog:
file1.abc
file2.abc
file2.def
file3.abc
file3.def
file4.abc
file5.abc
file5.def
file6.def
file8abc.def
file7.abc
file1.def
file9abc.def
file10def.abc
My script is as below (editted from yb007 script), but I have some issues with the output that I don;t know how to resolve. I notice the output is going to be wrong when the reflog file having any file with the name *abc.def (such as ie. file8abc.def & file9abc.def). It will be trim down the last 4 suffix and return the wrong .ext (which is .abc here but I suppose it should be .def).
#! /usr/bin/perl
use strict;
use warnings;
my #files_abc ;
my #files_def ;
my $line;
open(FILE1, 'reflog') || die ("Could not open reflog") ;
open (FILE2, '>log') || die ("Could not open log") ;
while ($line = <FILE1>) {
if($line=~ /(.*).abc/) {
push(#files_abc,$1);
} elsif ($line=~ /(.*).def/) {
push(#files_def,$1); }
}
close(FILE1);
my %first = map { $_ => 1 } #files_def ;
my #same = grep { $first{$_} } #files_abc ;
my #abc_only = grep { !$first{$_} } #files_abc ;
foreach my $abc (sort #abc_only) {
$abc .= ".abc";
}
my %second = map {$_=>1} #files_abc;
my #same2 = grep { $second{$_} } #files_def; ##same and same2 are equal.
my #def_only = grep { !$second{$_} } #files_def;
foreach my $def (sort #def_only) {
$def .= ".def";
}
my #combine_all = sort (#same, #abc_only, #def_only);
print "\nCombine all:-\n #combine_all\n" ;
print "\nList of files with same extension\n #same";
print "\nList of files with abc only\n #abc_only";
print "\nList of files with def only\n #def_only";
foreach my $item (sort #combine_all) {
print FILE2 "$item\n" ;
}
close (FILE2) ;
My output is like this which is wrong:-
1st:- print screen output as below:
Combine all:-
file.abc file.abc file1 file10def.abc file2 file3 file4.abc file5 file6.def file7.abc
List of files with same extension
file1 file2 file3 file5
List of files with abc only
file4.abc file.abc file7.abc file.abc file10def.abc
List of files with def only
file6.def
Log output as below:
**file.abc
file.abc**
file1
file10def.abc
file2
file3
file4.abc
file5
file6.def
file7.abc
Can you pls help me take a look where gies wrong? Thanks heaps.
ALWAYS add
use strict;
use warnings;
to the head of your program. They will catch most simple errors before you need to ask for help.
You should always check whether a file open succeeded with open FILE, "reflog" or die $!;
You are using a variable $ine that doesn't exist. You mean $line
The lines you read into the array contain a trailing newline. Write chomp #lines; to remove them
Your regular expressions are wrong and you need || instead of &&. Instead write if ($line =~ /\.(iif|isp)$/)
If you still have problems when these are fixed then please ask again.
Aside from the errors already pointed out, you appear to be loading #lines from FUNC instead of FILE. Is that also a typo?
Also, If reflog truly contains a series of lines with one filename on each line, why would you ever expect the conditional "if ($line =~ /.abc/ && $line =~ /.def/)" to evaluate true?
It would really help if you could post an example from the actual file you are reading from, along with the actual code you are debugging. Or at least edit the question to fix the typos already mentioned
use strict;
use warnings;
my #files_abc;
my #files_def;
my $line;
open(FILE,'reflog') || die ("could not open reflog");
while ($line = <FILE>) {
if($line=~ /(.*)\.abc/) {
push(#files_abc,$1);
}
elsif($line=~ /(.*)\.def/) {
push(#files_def,$1);
}
}
close(FILE);
my %second = map {$_=>1} #files_def;
my #same = grep { $second{$_} } #files_abc;
print "\nList of files with same extension\n #same";
foreach my $abc (#files_abc) {
$abc .= ".abc";
}
foreach my $def (#files_def) {
$def .= ".def";
}
print "\nList of files with abc extension\n #files_abc";
print "\nList of files with def extension\n #files_def";
Output is
List of files with same extension
file1 file2 file3 file5
List of files with abc extension
file1.abc file2.abc file3.abc file4.abc file5.abc file7.abc file10def.abc
List of files with def extension
file2.def file3.def file5.def file6.def file8abc.def file1.def file9abc.def
Hope this helps...
You don't need to slurp the whole file; you can read one line at a time. I think this code works on this extended version of your reflog file:
xx.pl
#!/usr/bin/env perl
use strict;
use warnings;
open my $file, '<', "reflog" or die "Failed to open file reflog for reading ($!)";
open my $func, '>', 'log' or die "Failed to create file log for writing ($!)";
my ($oldline, $oldname, $oldextn) = ("", "", "");
while (my $newline = <$file>)
{
chomp $newline;
$newline =~ s/^\s*//;
my ($newname, $newextn) = ($newline =~ m/(.*)([.][^.]*)$/);
if ($oldname eq $newname)
{
# Found the same file - presumably $oldextn eq ".abc" and $newextn eq ".def"
print $func "$newname\n";
print "$newname\n";
$oldline = "";
$oldname = "";
$oldextn = "";
}
else
{
print $func "$oldline\n" if ($oldline);
print "$oldline\n" if ($oldline);
$oldline = $newline;
$oldname = $newname;
$oldextn = $newextn;
}
}
print $func "$oldline\n" if ($oldline);
print "$oldline\n" if ($oldline);
#unlink "reflog" ;
chmod 0644, "log";
close $func;
close $file;
Since the code does not actually check the extensions, it would be feasible to omit $oldextn and $newextn; on the other hand, you might well want to check the extensions if you're sufficiently worried about the input format to need to deal with leading white space.
I very seldom find it good for a processing script like this to remove its own input, hence I've left unlink "reflog"; commented out; your mileage may vary. I would also often just read from standard input and write to standard output; that would simplify the code quite a bit. This code writes to both the log file and to standard output; obviously, you can omit either output stream. I was too lazy to write a function to handle the writing, so the print statements come in pairs.
This is a variant on control-break reporting.
reflog
file1.abc
file1.def
file2.abc
file2.def
file3.abc
file3.def
file4.abc
file5.abc
file5.def
file6.def
file7.abc
Output
$ perl xx.pl
file1
file2
file3
file4.abc
file5
file6.def
file7.abc
$ cat log
file1
file2
file3
file4.abc
file5
file6.def
file7.abc
$
To handle unsorted file names with blank lines
#!/usr/bin/env perl
use strict;
use warnings;
open my $file, '<', "reflog" or die "Failed to open file reflog for reading ($!)";
open my $func, '>', 'log' or die "Failed to create file log for writing ($!)";
my #lines;
while (<$file>)
{
chomp;
next if m/^\s*$/;
push #lines, $_;
}
#lines = sort #lines;
my ($oldline, $oldname, $oldextn) = ("", "", "");
foreach my $newline (#lines)
{
chomp $newline;
$newline =~ s/^\s*//;
my ($newname, $newextn) = ($newline =~ m/(.*)([.][^.]*)$/);
if ($oldname eq $newname)
{
# Found the same file - presumably $oldextn eq ".abc" and $newextn eq ".def"
print $func "$newname\n";
print "$newname\n";
$oldline = "";
$oldname = "";
$oldextn = "";
}
else
{
print $func "$oldline\n" if ($oldline);
print "$oldline\n" if ($oldline);
$oldline = $newline;
$oldname = $newname;
$oldextn = $newextn;
}
}
print $func "$oldline\n" if ($oldline);
print "$oldline\n" if ($oldline);
#unlink "reflog" ;
chmod 0644, "log";
close $func;
close $file;
This is very similar to the original code I posted. The new lines are these:
my #lines;
while (<$file>)
{
chomp;
next if m/^\s*$/;
push #lines, $_;
}
#lines = sort #lines;
my ($oldline, $oldname, $oldextn) = ("", "", ""); # Old
foreach my $newline (#lines)
This reads the 'reflog' file, skipping blank lines, saving the rest in the #lines array. When the lines are all read, they're sorted. Then, instead of a loop reading from the file, the new code reads entries from the sorted array of lines. The rest of the processing is as before. For your described input file, the output is:
file1
file2
file3
Urgh: the chomp $newline; is not needed, though it is not otherwise harmful. The old-fashioned chop (a precursor to chomp) would have been dangerous. Score one for modern Perl.
open( FILE, "reflog" );
open( FUNC, '>log' );
my %seen;
while ( chomp( my $line = <FILE> ) ) {
$line =~ s/^\s*//;
if ( $ine =~ /(\.+)\.(abc|def)$/ ) {
$seen{$1}++;
}
}
foreach my $file ( keys %seen ) {
if ( $seen{$file} > 1 ) {
## do whatever you want to
}
}
unlink "reflog";
chmod( 0750, "log" );
close(FUNC);
close(FILE);

help merging perl code routines together for file processing

I need some perl help in putting these (2) processes/code to work together. I was able to get them working individually to test, but I need help bringing them together especially with using the loop constructs. I'm not sure if I should go with foreach..anyways the code is below.
Also, any best practices would be great too as I'm learning this language. Thanks for your help.
Here's the process flow I am looking for:
read a directory
look for a particular file
use the file name to strip out some key information to create a newly processed file
process the input file
create the newly processed file for each input file read (if i read in 10, I create 10 new files)
Part 1:
my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
next if ($file =~ /^\.+$/);
#Get filename attributes
if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
print "$1\n";
print "$2\n";
print "$3\n";
}
print "$file\n";
}
Part 2:
use strict;
use Digest::MD5 qw(md5_hex);
#Create new file
open (NEWFILE, ">/backups/processed/foo$1.name.$2-foo_p$3.out") || die "cannot create file";
my $data = '';
my $line1 = <>;
chomp $line1;
my #heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ( "^A", "^E", "^D");
while (<>)
{
my $digest = md5_hex($data);
chomp;
my (#values) = split /,/;
my $extra = "__mykey__$sep1$digest$sep2" ;
$extra .= "$heading[$_]$sep1$values[$_]$sep2" for (0..scalar(#values));
$data .= "$extra$eorec";
print NEWFILE "$data";
}
#print $data;
close (NEWFILE);
You are using an old-style of Perl programming. I recommend you to use functions and CPAN modules (http://search.cpan.org). Perl pseudocode:
use Modern::Perl;
# use...
sub get_input_files {
# return an array of files (#)
}
sub extract_file_info {
# takes the file name and returs an array of values (filename attrs)
}
sub process_file {
# reads the input file, takes the previous attribs and build the output file
}
my #ifiles = get_input_files;
foreach my $ifile(#ifiles) {
my #attrs = extract_file_info($ifile);
process_file($ifile, #attrs);
}
Hope it helps
I've bashed your two code fragments together (making the second a sub that the first calls for each matching file) and, if I understood your description of the objective correctly, this should do what you want. Comments on style and syntax are inline:
#!/usr/bin/env perl
# - Never forget these!
use strict;
use warnings;
use Digest::MD5 qw(md5_hex);
my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
# Parens on postfix "if" are optional; I prefer to omit them
next if $file =~ /^\.+$/;
if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
process_file($file, $1, $2, $3);
}
print "$file\n";
}
sub process_file {
my ($orig_name, $foo_x, $name_x, $p_x) = #_;
my $new_name = "/backups/processed/foo$foo_x.name.$name_x-foo_p$p_x.out";
# - From your description of the task, it sounds like we actually want to
# read from the found file, not from <>, so opening it here to read
# - Better to use lexical ("my") filehandle and three-arg form of open
# - "or" has lower operator precedence than "||", so less chance of
# things being grouped in the wrong order (though either works here)
# - Including $! in the error will tell why the file open failed
open my $in_fh, '<', $orig_name or die "cannot read $orig_name: $!";
open(my $out_fh, '>', $new_name) or die "cannot create $new_name: $!";
my $data = '';
my $line1 = <$in_fh>;
chomp $line1;
my #heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ("^A", "^E", "^D");
while (<$in_fh>) {
chomp;
my $digest = md5_hex($data);
my (#values) = split /,/;
my $extra = "__mykey__$sep1$digest$sep2";
$extra .= "$heading[$_]$sep1$values[$_]$sep2"
for (0 .. scalar(#values));
# - Useless use of double quotes removed on next two lines
$data .= $extra . $eorec;
#print $out_fh $data;
}
# - Moved print to output file to here (where it will print the complete
# output all at once) rather than within the loop (where it will print
# all previous lines each time a new line is read in) to prevent
# duplicate output records. This could also be achieved by printing
# $extra inside the loop. Printing $data at the end will be slightly
# faster, but requires more memory; printing $extra within the loop and
# getting rid of $data entirely would require less memory, so that may
# be the better option if you find yourself needing to read huge input
# files.
print $out_fh $data;
# - $in_fh and $out_fh will be closed automatically when it goes out of
# scope at the end of the block/sub, so there's no real point to
# explicitly closing it unless you're going to check whether the close
# succeeded or failed (which can happen in odd cases usually involving
# full or failing disks when writing; I'm not aware of any way that
# closing a file open for reading can fail, so that's just being left
# implicit)
close $out_fh or die "Failed to close file: $!";
}
Disclaimer: perl -c reports that this code is syntactically valid, but it is otherwise untested.