perl creates unexplained file called "1" - perl

I have been trying to get rid of a weird bug for hours, with no success. I have a subroutine that sorts a file. here is the code:
sub sort_file {
$filename = #_;
print #_;
print $filename;
open(SRTINFILE,"<$filename");
#lines=<SRTINFILE>;
close(SRTINFILE);
open(SRTOUTFILE,">$filename");
#sorted = sort { #aa=split(/ /,$a); #bb=split(/ /,$b); return ($aa[1] <=> $bb[1]); } #lines;
print SRTOUTFILE #sorted;
close(SRTOUTFILE);
}
any time this function is run, perl creates a file, called "1". i have no idea why. I am a complete perl noob and am just using it for quick and dirty text file processing. anyone know what's wrong?

An array in scalar context evalutes to the number of elements in the array. If you pass one argument to the function, the following assigns 1 to $filename.
$filename = #_;
You want any of the following:
$filename = $_[0];
$filename = shift;
($filename) = #_;
Furthermore, you want to limit the scope of the variable to the function, so you want
my $filename = $_[0];
my $filename = shift;
my ($filename) = #_;
(my $filename) = #_; # Exact same as previous.

The other answers are sufficient to tell you why you were getting strange errors.
I would like to show you how a more experienced Perl programmer might write this subroutine.
use warnings;
use strict;
use autodie;
sub sort_file {
my( $filename ) = #_;
my #lines;
{
# 3 arg open
open my $in_fh, '<', $filename;
#lines = <$in_fh>;
close $in_fh;
}
# Schwartzian transform
my #sorted = map{
$_->[0]
} sort {
$a->[2] <=> $b->[2]
} map {
[ $_, split ' ', $_ ]
} #lines;
{
open my $out_fh, '>', $filename;
print {$out_fh} #sorted;
close $out_fh;
}
}
use strict;
prevents you from using a variable without declaring it (among other things).
use warnings;
Informs you of some potential errors.
use autodie;
Now you don't need to write open .... or die ....
{ open ...; #lines = <$fh>; close $fh }
Limits the scope of the FileHandle.
#sorted = map { ... } sort { ... } map { ... } #list
This is an examples of a Schwartzian transform, which reduces the number of times that the values are split. In this example, it may be overkill.

How confusing. Assigning $filename = #_ the way you are means that you are evaluating an array in scalar context, which means that $filename is assigned the number of elements in #_. Because you don't check to see if the first open call succeeds, reading the file 1 likely fails, but you continue anyway and open for writing a file named 1. The solution is to use $filename in an array context and begin your subroutine with ($filename) = #_ or $filename = shift.
Why aren't you using use strict by the way?

Always use:
use strict;
use warnings;
Then Perl will tell you when you're off the mark.
As you've observed, the notation:
$filename = #_;
means that an unscoped variable is assigned the number of elements in the argument list to the function, and since you pass one file, the name of the created file will be '1'.
You meant to write:
my($filename) = #_;
This provides list context for the array, and assigns $_[0] to $filename, ignoring any extra arguments to the function.

OK... nevermind. it just dawned on me. $filename = #_; makes no sense. should be $filename = #_[0]; . There goes 2 hours of my life. note to other perl noobs: beware.

Related

Perl - Could not open and read files

I've created a script for validating xml files after given input folder. It should grep xml files from the input directory then sort out the xml files and check the condition. But it throws a command that not Open at line , <STDIN> line 1.
But it creates an empty log file.
Since i faced numeric error while sorting, comment that.
so i need to be given input location, the script should check the xml files and throw errors in a mentioned log file.
Anyone can help this?
Script
#!/usr/bin/perl
# use strict;
use warnings;
use Cwd;
use File::Basename;
use File::Path;
use File::Copy;
use File::Find;
print "Enter the path: ";
my $filepath = <STDIN>;
chomp $filepath;
die "\n\tpleas give input folder \n" if(!defined $filepath or !-d $filepath);
my $Toolpath = dirname($0);
my $base = basename($filepath);
my $base_path = dirname($filepath);
my ($xmlF, #xmlF);
my #errors=();
my #warnings=();
my #checkings=();
my $ecount=0;
my $wcount=0;
my $ccount=0;
my ($x, $y);
my $z="0";
opendir(DIR,"$filepath");
my #xmlFiles = grep{/\.xml$/} readdir(DIR);
closedir(DIR);
my $logfile = "$base_path\\$base"."_Err.log";
# #xmlF=sort{$a <=> $b}#xmlFiles;
#xmlF=sort{$a cmp $b}#xmlFiles;
open(OUT, ">$logfile") || die ("\nLog file couldnt write $logfile :$!");
my $line;
my $flcnt = scalar (#xmlF);
for ($x=0; $x < $flcnt; $x++)
{
open IN, "$xmlF[$x]" or die "not Open";
print OUT "\n".$xmlF[$x]."\n==================\n";
print "\nProcessing File $xmlF[$x] .....\n";
local $/;
while ($line=<IN>)
{
while ($line=~m#(<res(?: [^>]+)? type="weblink"[^>]*>)((?:(?!</res>).)*)</res>#igs)
{
my $tmp1 = $1; my $tmp2 = $&; my $pre1 = $`;
if($tmp1 =~ m{ subgroup="Weblink"}i){
my $pre = $pre1.$`;
if($tmp2 !~ m{<tooltip><\!\[CDATA\[Weblink\]\]><\/tooltip>}ms){
my $pre = $pre1.$`;
push(#errors,lineno($pre),"\t<tooltip><\!\[CDATA\[Weblink\]\]></tooltip> is missing\n");
}
}
}
foreach my $warnings(#warnings)
{
$wcount = $wcount+1;
}
foreach my $checkings(#checkings)
{
$ccount = $ccount+1;
}
foreach my $errors(#errors)
{
$ecount = $ecount+1;
}
my $count_err = $ecount/2;
print OUT "".$count_err." Error(s) Found:-\n------------------------\n ";
print OUT "#errors\n";
$ecount = 0;
my $count_war = $wcount/2;
print OUT "$count_war Warning(s) Found:-\n-------------------------\n ";
print OUT "#warnings\n";
$wcount = 0;
my $count_check = $ccount/2;
print OUT "$count_check Checking(s) Found:-\n-------------------------\n ";
print OUT "#checkings\n";
$wcount = 0;
undef #errors;
undef #warnings;
undef #checkings;
close IN;
}
}
The readdir returns bare file names, without the path.
So when you go ahead to open those files you need to prepend the names returned by readdir with the name of the directory the readdir read them from, here $filepath. Or build the full path names right away
use warnings;
use strict;
use feature 'say';
use File::Spec;
print "Enter the path: ";
my $filepath = <STDIN>;
chomp $filepath;
die "\nPlease give input folder\n" if !defined $filepath or !-d $filepath;
opendir(my $fh_dir, $filepath) or die "Can't opendir $filepath: $!";
my #xml_files =
map { File::Spec->catfile($filepath, $_) }
grep { /\.xml$/ }
readdir $fh_dir;
closedir $fh_dir;
say for #xml_files;
where I used File::Spec to portably piece together the file name.
The map can be made to also do grep's job so to make only one pass over the file list
my #xml_files =
map { /\.xml$/ ? File::Spec->catfile($filepath, $_) : () }
readdir $fh_dir;
The empty list () gets flattened in the returned list, effectively disappearing altogether.
Here are some comments on the code. Note that this is normally done at Code Review but I feel that it is needed here.
First: a long list of variables is declared upfront. It is in fact important to declare in as small a scope as possible. It turns out that most of those variables can indeed be declared where they are used, as seen in comments below.
The location of the executable is best found using
use FindBin qw($RealBin);
where $RealBin also resolves links (as opposed to $Bin, also available)
Assigning () to an array at declaration doesn't do anything; it is exactly the same as normal my #errors;. They can also go together, my (#errors, #warnings, #checks);. If the array has something then = () clears it, what is a good way to empty an array
Assigning a "0" makes the variable a string. While Perl normally converts between strings and numbers as needed, if a number is needed then use a number, my $z = 0;
Lexical filehandles (open my $fh, ...) are better than globs (open FH, ...)
I don't understand the comment about "numeric error" in sorting. The cmp operator sorts lexicographically, for numeric sort use <=>
When array is used in scalar context – when assigned to a scalar for example – the number of elements is returned. So no need for scalar but do my flcnt = #xmlF;
For iteration over array indices use $#ary, the index of the last element of #ary, for
foreach my $i (0..$#xmlF) { ... }
But if there aren't any uses of the index (I don't see any) then loop over elements
foreach my $file (#xmlF) { ... }
When you check the file open print the error $!, open ... or die "... : $!";. This is done elsewhere in the code, and it should be done always.
The local $/; unsets the input record separator, what makes the following read take the whole file. If that is intended then $line is not a good name. Also note that a variable can be declared inside the condition, while (my $line = <$fh>) { }
I can't comment on the regex as I don't know what it's supposed to accomplish, but it is complex; any chance to simplify all that?
The series of foreach loops only works out the number of elements of those arrays; there is no need for loops then, just my $ecount = #errors; (etc). This also allows you to keep the declaration of those counter variables in minimal scope.
The undef #errors; (etc) aren't needed since those arrays count for each file and so you can declare them inside the loops, anew at each iteration (and at smallest scope). When you wish to empty an array it is better to do #ary = (); than to undef it; that way it's not allocated all over again on the next use

Perl Merge file

I have 3 or multiple files I need to merge, the data looks like this..
file 1
0334.45656
0334.45678
0335.67899
file 2
0334.89765
0335.12346
0335.56789
file 3
0334.12345
0335.45678
0335.98764
Expected output in file 4,
0334.89765
0334.89765
0334.89765
0334.12345
0335.67899
0335.12346
0335.56789
0335.45678
0335.98764
So far I have tried but data in 4rth file does not come in sorted order,
#!/usr/bin/perl
my %hash;
my $outFile = "outFile.txt";
foreach $file(#ARGV)
{
print "$file\n";
open (IN, "$file") || die "cannot open file $!";
open (OUT,">>$outFile") || die "cannot open file $!";
while ( <IN> )
{
chomp $_;
($timestamp,$data) = split (/\./,$_);
$hash{$timeStamp}{'data'}=$data;
if (defined $hash{$timeStamp})
{
print "$_\n";
print OUT"$_\n";
}
}
}
close (IN);
close (OUT);
I wouldn't normally suggest this, but unix utilties should be able to handle this just fine.
cat the 3 files together.
use sort to sort the merged file.
However, using perl, could just do the following:
#!/usr/bin/perl
use strict;
use warnings;
my #data;
push #data, $_ while (<>);
# Because the numbers are all equal length, alpha sort will work here
print for sort #data;
However, as we've discussed, it's possible that the files will be extremely large. Therefore it will be more efficient both in memory and speed if you're able to take advantage of the fact that all the files are already sorted.
The following solution therefore streams the files, pulling out the next one in order each loop of the while:
#!/usr/bin/perl
# Could name this catsort.pl
use strict;
use warnings;
use autodie;
# Initialize File handles
my #fhs = map {open my $fh, '<', $_; $fh} #ARGV;
# First Line of each file
my #data = map {scalar <$_>} #fhs;
# Loop while a next line exists
while (#data) {
# Pull out the next entry.
my $index = (sort {$data[$a] cmp $data[$b]} (0..$#data))[0];
print $data[$index];
# Fill In next Data at index.
if (! defined($data[$index] = readline $fhs[$index])) {
# End of that File
splice #fhs, $index, 1;
splice #data, $index, 1;
}
}
Using Miller's idea in a more reusable way,
use strict;
use warnings;
sub get_sort_iterator {
my #fhs = map {open my $fh, '<', $_ or die $!; $fh} #_;
my #d;
return sub {
for my $i (0 .. $#fhs) {
# skip to next file handle if it doesn't exists or we have value in $d[$i]
next if !$fhs[$i] or defined $d[$i];
# reading from $fhs[$i] file handle was success?
if ( defined($d[$i] = readline($fhs[$i])) ) { chomp($d[$i]) }
# file handle at EOF, not needed any more
else { undef $fhs[$i] }
}
# compare as numbers, return undef if no more data
my ($index) = sort {$d[$a] <=> $d[$b]} grep { defined $d[$_] } 0..$#d
or return;
# return value from $d[$index], and set it to undef
return delete $d[$index];
};
}
my $iter = get_sort_iterator(#ARGV);
while (defined(my $x = $iter->())) {
print "$x\n";
}
output
0334.12345
0334.45656
0334.45678
0334.89765
0335.12346
0335.45678
0335.56789
0335.67899
0335.98764
Suppose every input files are already in ascending order and have at least one line in them, this script could merge them in ascending order:
#!/usr/bin/perl
use warnings;
use strict;
use List::Util 'reduce';
sub min_index {
reduce { $_[$a] < $_[$b] ? $a : $b } 0 .. $#_;
}
my #fhs = map { open my $fh, '<', $_; $fh } #ARGV;
my #data = map { scalar <$_> } #fhs;
while (#data) {
my $idx = min_index(#data);
print "$data[$idx]";
if (! defined($data[$idx] = readline $fhs[$idx])) {
splice #data, $idx, 1;
splice #fhs, $idx, 1;
}
}
Note: this is basic the same as the second script offered by #Miller, but a bit clearer and more concise.
I suggest this solution, which uses a sorted array of hashes - each hash corresponding to an input file, and containing a file handle fh, the last line read line and the timestamp extracted from the line timestamp.
The hash at the end of the array always corresponds to the input that has the smallest value for the timestamp, so all that is necessary is to repeateedly pop the next value from the array, print its data, read the next line and (if it hasn't reached eof) insert it back into the array in sorted order.
This could produce an appreciable increase in speed over the repeated sorting of all the data for each output line that other answers use.
Note that the program expects the list of input files as parameters on the command line, and sends its merged output to STDOUT. It also assumes that the input files are already sorted.
use strict;
use warnings;
use autodie;
my #data;
for my $file (#ARGV) {
my $item;
open $item->{fh}, '<', $file;
insert_item($item, \#data);
}
while (#data) {
my $item = pop #data;
print $item->{line};
insert_item($item, \#data);
}
sub insert_item {
my ($item, $array) = #_;
return if eof $item->{fh};
$item->{line} = readline $item->{fh};
($item->{timestamp}) = $item->{line} =~ /^(\d+)/;
my $i = 0;
++$i while $i < #$array and $item->{timestamp} < $array->[$i]{timestamp};
splice #$array, $i, 0, $item;
}
output
0334.45656
0334.89765
0334.12345
0334.45678
0335.12346
0335.45678
0335.67899
0335.56789
0335.98764

Count the occurrences of a string in a file

I wrote a perl script to count the occurrences of a character in a file.
So far this is what I have got,
#!/usr/bin/perl -w
use warnings;
no warnings ('uninitialized', 'substr');
my $lines_ref;
my #lines;
my $count;
sub countModule()
{
my $file = "/test";
open my $fh, "<",$file or die "could not open $file: $!";
my #contents = $fh;
my #filtered = grep (/\// ,#contents);
return \#filtered;
}
#lines = countModule();
##lines = $lines_ref;
$count = #lines;
print "###########\n $count \n###########\n";
My test file looks like this:
10.0.0.1/24
192.168.10.0/24
172.16.30.1/24
I am basically trying to count the number of instances of "/"
This is the output that I get:
###########
1
###########
I am getting 1 instead of 3, which is the number of occurrences.
Still learning perl, so any help will be appreciated..Thank you!!
Here are a few points about your code
You should always use strict at the top of your program, and only use no warnings for special reasons in a limited scope. There is no general reason why a working Perl program should need to disable warnings globally
Declare your variables close to their first point of use. The style of declaring everything at the top of the file is unnecessary and is a legacy of C
Never use prototypes in your code. They are available for very special purposes and shouldn't be used for the vast majority of Perl code. sub countModule() { ... } insists that countModule may never be called with any parameters and isn't necessary or useful. The definition should be just sub countModule { ... }
A big well done! for using a lexical file handle, the three-parameter form of open, and putting $! in your die string
my #contents = $fh will just set #contents to a single-element list containing just the filehandle. To read the whole file into the array you need my #contents = <$fh>
You can avoid escaping slashes in a regular expression if you use a different delimiter. To do that you need to use the m operator explicitly, like my #filtered = grep m|/|, #contents)
You return an array reference but assign the returned value to an array, so #lines = countModule() sets #lines to a single-element list containing just the array reference. You should either return a list with return #filtered or dereference the return value on assignment with #lines = #{ countModule }
If all you need to do is to print the number of lines in the file that contain a slash character then you could write something like this
use strict;
use warnings;
my $count;
sub countModule {
open my $fh, '<', '/test' or die "Could not open $file: $!";
return [ grep m|/|, <$fh> ];
}
my $lines = countModule;
$count = #$lines;
print "###########\n $count \n###########\n";
Close, but a few issues:
use strict;
use warnings;
sub countModule
{
my $file = "/test";
open my $fh, "<",$file or die "could not open $file: $!";
my #contents = <$fh>; # The <> brackets are used to read from $fh.
my #filtered = grep (/\// ,#contents);
return #filtered; # Remove the reference.
}
my #lines = countModule();
my $count = scalar #lines; # 'scalar' is not required, but lends clarity.
print "###########\n $count \n###########\n";
Each of the changes I made to your code are annotated with a #comment explaining what was done.
Now in list context your subroutine will return the filtered lines. In scalar context it will return a count of how many lines were filtered.
You did also mention find the occurrences of a character (despite everything in your script being line-oriented). Perhaps your counter sub would look like this:
sub file_tallies{
my $file = '/test';
open my $fh, '<', $file or die $!;
my $count;
my $lines;
while( <$fh> ) {
$lines++;
$count += $_ =~ tr[\/][\/];
}
return ( $lines, $count );
}
my( $line_count, $slash_count ) = file_tallies();
In list context,
return \#filtered;
returns a list with one element -- a reference to the named array #filtered. Maybe you wanted to return the list itself
return #filtered;
Here's some simpler code:
sub countMatches {
my ($file, $c) = #_; # Pass parameters
local $/;
undef $/; # Slurp input
open my $fh, "<",$file or die "could not open $file: $!";
my $s = <$fh>; # The <> brackets are used to read from $fh.
close $fh;
my $ptn = quotemeta($c); # So we can match strings like ".*" verbatim
my #hits = $s =~ m/($ptn)/g;
0 + #hits
}
print countMatches ("/test", '/') . "\n";
The code pushes Perl beyond the very basics, but not too much. Salient points:
By undeffing $/ you can read the input into one string. If you're counting
occurrences of a string in a file, and not occurrences of lines that contain
the string, this is usually easier to do.
m/(...)/g will find all the hits, but if you want to count strings like
"." you need to quote the meta characters in them.
Store the results in an array to evaluate m// in list context
Adding 0 to a list gives the number of items in it.

working with file handles in Perl

I am writing a comparefiles subroutine in Perl that reads a line of text from one file (f1) and then searches for it in another (f2) in the normal O(n^2) way.
sub comparefiles {
my($f1, $f2) = #_;
while(<f1>) {
# reset f2 to the beginning of the file
while(<f2>) {
}
}
}
sub someother {
open (one, "<one.out");
open (two, "<two.out");
&comparefiles(&one, &two);
}
I have two questions
How do I pass the file handles to the subroutine? In the above code, I have used them as scalars. Is that the correct way?
How do I reset the file pointer f2 to the beginning of the file at the position marked in the comment above?
First of all start every time your script with :
use strict;
use warnings;
Use lexical filehandle, three args open and test the result :
open my $fh1 , '<' , $filename1 or die "can't open '$filename1' for reading : $!";
Then you can pass the filehandles to the sub :
comparefiles($fh1, $fh2);
To rewind the file use the seek function (perldoc -f seek)
seek $fh, 0, 0;
If the files are small enough to fit in memory, you might consider storing the lines in a hash, which would prevent the need for O(n^2) searching.
Within the framework of your existing approach, I would advise against nesting your file reading loops -- perhaps on aesthetic grounds if nothing else. Instead, put the inner loop in a subroutine.
use strict;
use warnings;
# Works for 2 or more files.
analyze_files(#ARGV);
sub analyze_files {
my #file_names = #_;
my #handles = map { open my $h, '<', $_; $h } #_;
my $fh = shift #handles;
while (my $line = <$fh>) {
my #line_numbers = map { find_in_file($_, $line) } #handles;
print join("\t", #line_numbers, $line);
}
}
# Takes a file handle and a line to hunt for.
# Returns line number if the line is found.
sub find_in_file {
my ($fh, $find_this) = #_;
seek $fh, 0, 0;
while (my $line = <$fh>){
return $. if $line eq $find_this;
}
return -1; # Not found.
}

How can I read from a Perl filehandle that is an array element?

I quickly jotted off a Perl script that would average a few files with just columns of numbers. It involves reading from an array of filehandles. Here is the script:
#!/usr/local/bin/perl
use strict;
use warnings;
use Symbol;
die "Usage: $0 file1 [file2 ...]\n" unless scalar(#ARGV);
my #fhs;
foreach(#ARGV){
my $fh = gensym;
open $fh, $_ or die "Unable to open \"$_\"";
push(#fhs, $fh);
}
while (scalar(#fhs)){
my ($result, $n, $a, $i) = (0,0,0,0);
while ($i <= $#fhs){
if ($a = <$fhs[$i]>){
$result += $a;
$n++;
$i++;
}
else{
$fhs[$i]->close;
splice(#fhs,$i,1);
}
}
if ($n){ print $result/$n . "\n"; }
}
This doesn't work. If I debug the script, after I initialize #fhs it looks like this:
DB<1> x #fhs
0 GLOB(0x10443d80)
-> *Symbol::GEN0
FileHandle({*Symbol::GEN0}) => fileno(6)
1 GLOB(0x10443e60)
-> *Symbol::GEN1
FileHandle({*Symbol::GEN1}) => fileno(7)
So far, so good. But it fails at the part where I try to read from the file:
DB<3> x $fhs[$i]
0 GLOB(0x10443d80)
-> *Symbol::GEN0
FileHandle({*Symbol::GEN0}) => fileno(6)
DB<4> x $a
0 'GLOB(0x10443d80)'
$a is filled with this string rather than something read from the glob. What have I done wrong?
You can only use a simple scalar variable inside <> to read from a filehandle. <$foo> works. <$foo[0]> does not read from a filehandle; it's actually equivalent to glob($foo[0]). You'll have to use the readline builtin, a temporary variable, or use IO::File and OO notation.
$text = readline($foo[0]);
# or
my $fh = $foo[0]; $text = <$fh>;
# or
$text = $foo[0]->getline; # If using IO::File
If you weren't deleting elements from the array inside the loop, you could easily use a temporary variable by changing your while loop to a foreach loop.
Personally, I think using gensym to create filehandles is an ugly hack. You should either use IO::File, or pass an undefined variable to open (which requires at least Perl 5.6.0, but that's almost 10 years old now). (Just say my $fh; instead of my $fh = gensym;, and Perl will automatically create a new filehandle and store it in $fh when you call open.)
If you are willing to use a bit of magic, you can do this very simply:
use strict;
use warnings;
die "Usage: $0 file1 [file2 ...]\n" unless #ARGV;
my $sum = 0;
# The current filehandle is aliased to ARGV
while (<>) {
$sum += $_;
}
continue {
# We have finished a file:
if( eof ARGV ) {
# $. is the current line number.
print $sum/$. , "\n" if $.;
$sum = 0;
# Closing ARGV resets $. because ARGV is
# implicitly reopened for the next file.
close ARGV;
}
}
Unless you are using a very old perl, the messing about with gensym is not necessary. IIRC, perl 5.6 and newer are happy with normal lexical handles: open my $fh, '<', 'foo';
I have trouble understanding your logic. Do you want to read several files, which just contains numbers (one number per line) and print its average?
use strict;
use warnings;
my #fh;
foreach my $f (#ARGV) {
open(my $fh, '<', $f) or die "Cannot open $f: $!";
push #fh, $fh;
}
foreach my $fh (#fh) {
my ($sum, $n) = (0, 0);
while (<$fh>) {
$sum += $_;
$n++;
}
print "$sum / $n: ", $sum / $n, "\n" if $n;
}
Seems like a for loop would work better for you, where you could actually use the standard read (iteration) operator.
for my $fh ( #fhs ) {
while ( defined( my $line = <$fh> )) {
# since we're reading integers we test for *defined*
# so we don't close the file on '0'
#...
}
close $fh;
}
It doesn't look like you want to shortcut the loop at all. Therefore, while seems to be the wrong loop idiom.