Is there a way to get the line number (and maybe filename) where a __DATA__ token was coded? Or some other way to know the actual line number in the original source file where a line of data read from the DATA filehandle came from?
Note that $. counts from 1 when reading from the DATA filehandle. So if the line number of the __DATA__ token were added to $. it would be what I'm looking for.
For example:
#!/usr/bin/perl
while (<DATA>) {
my $n = $. + WHAT??;
die "Invalid data at line $n\n" if /bad/;
}
__DATA__
something good
something bad
I want this to say "Invalid data at line 9", not "line 2" (which is what you get if $. is used by itself).
In systems that support /proc/<pid> virtual filesystems (e.g., Linux), you can do:
# find the file where <DATA> handle is read from
my $DATA_FILE = readlink("/proc/$$/fd/" . fileno(*DATA));
# find the line where DATA begins
open my $THIS, "<", $DATA_FILE;
my #THIS = <$THIS>;
my ($DATA_LINE) = grep { $THIS[$_] =~ /^__DATA__\b/ } 0 .. $#THIS;
File don't actually have lines; they're just sequences of bytes. The OS doesn't even offer the capability of getting a line from a file, so it has no concept of line numbers.
Perl, on the other hand, does keep track of a line number for each handle. It is accessed via $..
However, the Perl handle DATA is created from a file descriptor that's already been moved to the start of the data —it's the file descriptor that Perl itself uses to load and parse the file— so there's no record of how many lines have already been read. So the line 1 of DATA is the first line after __DATA__.
To correct the line count, one must seek back to the start of the file, and read it line by line until the file handle is back at the same position it started.
#!/usr/bin/perl
use strict;
use warnings qw( all );
use Fcntl qw( SEEK_SET );
# Determines the line number at the current file position without using «$.».
# Corrects the value of «$.» and returns the line number.
# Sets «$.» to «1» and returns «undef» if unable to determine the line number.
# The handle is left pointing to the same position as when this was called, or this dies.
sub fix_line_number {
my ($fh) = #_;
( my $initial_pos = tell($fh) ) >= 0
or return undef;
seek($fh, 0, SEEK_SET)
or return undef;
$. = 1;
while (<$fh>) {
( my $pos = tell($fh) ) >= 0
or last;
if ($pos >= $initial_pos) {
if ($pos > $initial_pos) {
seek($fh, $initial_pos, SEEK_SET)
or die("Can't reset handle: $!\n");
}
return $.;
}
}
seek($fh, $initial_pos, SEEK_SET)
or die("Can't reset handle: $!\n");
$. = 1;
return undef;
}
my $prefix = fix_line_number(\*DATA) ? "" : "+";
while (<DATA>) {
printf "%s:%s: %s", __FILE__, "$prefix$.", $_;
}
__DATA__
foo
bar
baz
Output:
$ ./a.pl
./a.pl:48: foo
./a.pl:49: bar
./a.pl:50: baz
$ perl <( cat a.pl )
/dev/fd/63:+1: foo
/dev/fd/63:+2: bar
/dev/fd/63:+3: baz
Perl keeps track of the file and line at which each symbol is created. A symbol is normally created when the parser/compiler first encounters it. But if __DATA__ is encountered before DATA is otherwise created, this will create the symbol. We can take advantage of this to set the line number associated with the file handle in DATA.
For the case where the Package::DATA handle is not used in Package.pm itself, the line number of the __DATA__ token could be obtained via B::GV->LINE on the DATA handle:
$ cat Foo.pm
package Foo;
1;
__DATA__
good
bad
$ perl -I. -MFoo -MB -e '
my $ln = B::svref_2object(\*Foo::DATA)->LINE;
warn "__DATA__ at line $ln\n";
Foo::DATA->input_line_number($ln);
while(<Foo::DATA>){ die "no good" unless /good/ }
'
__DATA__ at line 4
no good at -e line 1, <DATA> line 6.
In the case where the DATA handle is referenced in the file itself, a possible kludge would be to use an #INC hook:
$ cat DH.pm
package DH;
unshift #INC, sub {
my ($sub, $fname) = #_;
for(#INC){
if(open my $fh, '<', my $fpath = "$_/$fname"){
$INC{$fname} = $fpath;
return \'', $fh, sub {
our (%ln, %pos);
if($_){ $pos{$fname} += length; ++$ln{$fname} }
}
}
}
};
$ cat Bar.pm
package Bar;
print while <DATA>;
1;
__DATA__
good
bad
$ perl -I. -MDH -MBar -e '
my $fn = "Bar.pm";
warn "__DATA__ at line $DH::ln{$fn} pos $DH::pos{$fn}\n";
seek Bar::DATA, $DH::pos{$fn}, 0;
Bar::DATA->input_line_number($DH::ln{$fn});
while (<Bar::DATA>){ die "no good" unless /good/ }
'
good
bad
__DATA__ at line 6 pos 47
no good at -e line 6, <DATA> line 8.
Just for the sake of completion, in the case where you do have control over the file, all could be easily done with:
print "$.: $_" while <DATA>;
BEGIN { our $ln = __LINE__ + 1; DATA->input_line_number($ln) }
__DATA__
...
You can also use the first B::GV solution, provided that you reference the DATA handle via an eval:
use B;
my ($ln, $data) = eval q{B::svref_2object(\*DATA)->LINE, \*DATA}; die $# if $#;
$data->input_line_number($ln);
print "$.: $_" while <$data>;
__DATA__
...
None of these solutions assumes that the source file are seekable (except if you want to read the DATA more than once, as I did in the second example), or try to reparse your files, etc.
Comparing the end of the file to itself in reverse might do what you want:
#!/usr/bin/perl
open my $f, "<", $0;
my #lines;
my #dataLines;
push #lines ,$_ while <$f>;
close $f;
push #dataLines, $_ while <DATA>;
my #revLines= reverse #lines;
my #revDataLines=reverse #dataLines;
my $count=#lines;
my $offset=0;
$offset++ while ($revLines[$offset] eq $revDataLines[$offset]);
$count-=$offset;
print "__DATA__ section is at line $count\n";
__DATA__
Hello there
"Some other __DATA__
lkjasdlkjasdfklj
ljkasdf
Running give a output of :
__DATA__ section is at line 19
The above script reads itself (using $0 for file name) into the #lines array and reads the DATA file into the #dataLines array.
The arrays are reversed and then compared element by element until they are different. The number of lines are tracked in $offset and this is subtracted from the $count variable which is the number of lines in the file.
The result is the line number the DATA section starts at. Hope that helps.
Thank you #mosvy for the clever and general idea.
Below is a consolidated solution which works anywhere. It uses a symbolic reference instead of eval to avoid mentioning "DATA" at compile time, but otherwise uses the same ideas as mosvy.
The important point is that code in a package containing __DATA__ must not refer to the DATA symbol by name so that that symbol won't be created until the compiler sees the __DATA__ token. The way to avoid mentioning DATA is to use a filehandle ref created at run-time.
# Get the DATA filehandle for a package (default: the caller's),
# fixed so that "$." provides the actual line number in the
# original source file where the last-read line of data came
# from, rather than counting from 1.
#
# In scalar context, returns the fixed filehandle.
# In list context, returns ($fh, $filename)
#
# For this to work, a package containing __DATA__ must not
# explicitly refer to the DATA symbol by name, so that the
# DATA symbol (glob) will not yet be created when the compiler
# encounters the __DATA__ token.
#
# Therefore, use the filehandle ref returned by this
# function instead of DATA!
#
sub get_DATA_fh(;$) {
my $pkg = $_[0] // caller;
# Using a symbolic reference to avoid mentioning "DATA" at
# compile time, in case we are reading our own module's __DATA__
my $fh = do{ no strict 'refs'; *{"${pkg}::DATA"} };
use B;
$fh->input_line_number( B::svref_2object(\$fh)->LINE );
wantarray ? ($fh, B::svref_2object(\$fh)->FILE) : $fh
}
Usage examples:
my $fh = get_DATA_fh; # read my own __DATA__
while (<$fh>) { print "$. : $_"; }
or
my ($fh,$fname) = get_DATA_fh("Otherpackage");
while (<$fh>) {
print " $fname line $. : $_";
}
I need to open and use the contents of files as they are passed as a reference into my subroutine. I have researched and found
How to pass a file handle to a function?
but this did not work for me.
I have minimized the code considerably to get to the point. Here it is.
#!/usr/bin/env perl
use strict;
use warnings;
my #dirs = qw(/opt/r2configs/sapbid00/2018/03/15/lparstat-i.out /opt/r2configs/sapbiq00/2018/03/15/lparstat-i.out);
sub _ce {
die "no parameter!\n" unless #_;
my ($ce_ref) = #_;
for my $name (#{ $ce_ref }) {
print "$name\n";
chomp $name;
open (my $i, "+<", $name) or warn $!;
while (<$i>) {
print "$i,\n";
}
}
}
_ce( \#dirs );
# perl -c foo
foo syntax OK
root#r2nim01.xxx.com(/usr/local/scripts)$
# perl foo
/opt/r2configs/sapbid00/2018/03/15/lparstat-i.out
GLOB(0x20020148),
GLOB(0x20020148),
GLOB(0x20020148),
GLOB(0x20020148),
GLOB(0x20020148),
/opt/r2configs/sapbiq00/2018/03/15/lparstat-i.out
GLOB(0x20020148),
GLOB(0x20020148),
.
.
.
### The files do exist and I am root user. ###
# ls -l /opt/r2configs/sapbiq00/2018/03/15/lparstat-i.out
-rw-r--r-- 1 root system 2217 Mar 15 20:35 /opt/r2configs/sapbiq00/2018/03/15/lparstat-i.out
while (<$i>) {
print "$i,\n";
}
Here you are printing the filehandle for each line in the file. Print the line you read instead:
print "$_,\n";
Better yet:
while (my $line = <$i>) {
print "$line,\n";
}
You are outputting the file handle, not the content.
while (<$i>) {
print "$i,\n"; # <--- here
}
$i is the handle. Since you're reading without a variable, you need to look at $_.
print "$_,\n"; # <--- here
You might also want to name your handle $fh as per the convention. $i is typically used for a for loop iteration variable that counts something.
I am writing a script in Perl where I have to open the same file twice in my code. This is my outline of the code:
#!/usr/bin/perl
use strict;
use warnings;
my %forward=();
my %reverse=();
while(<>){
chomp;
# store something
}
}
while(<>){ # open the same file again
chomp;
#print something
}
I am using the diamond operator so I am running the script like this
perl script.pl input.txt
But this is not producing any output. If I open the File using filehandle, the script works. What can be possibly wrong here?
Save your #ARGV before exhausting it. Of course, this will only work for actual files specified on the command line, and not with STDIN.
#!/usr/bin/env perl
use strict;
use warnings;
run(#ARGV);
sub run {
my #argv = #_;
first(#argv);
second(#argv);
}
sub first {
local #ARGV = #_;
print "First pass: $_" while <>;
}
sub second {
local #ARGV = #_;
print "Second pass: $_" while <>;
}
You read all there was to be read in the first loop, leaving nothing to read in the second.
If the input aren't huge, you can simply load it into memory.
my #lines = <>;
chomp( #lines );
for (#lines) {
...
}
for (#lines) {
...
}
I am attempting to loop through a directory and all of its sub-directories to see if the files within those directories are a certain size. But I am not sure if the files in the #files array still contains the file size so I can compare the size( i.e. - size <= value_size ). Can someone offer any guidance?
use strict;
use warnings;
use File::Find;
use DateTime;
my #files;
my $dt = DateTime->now;
my $date = $dt->ymd;
my $start_dir = "/apps/trinidad/archive/in/$date";
my $empty_file = 417;
find( \&wanted, $start_dir);
for my $file( #files )
{
if(`ls -ltr | awk '{print $5}'`<= $empty_file)
{
print "The file $file appears to be empty please check within the folder if this is empty"
}
else
return;
}
exit;
sub wanted {
push #files, $File::Find::name unless -d;
return;
}
I think you could use this code instead of shelling out to awk.
(Don't understand why my empty_file = 417; is an empty file size).
if (-s $file <= $empty_file)
Also notice that you are missing an open and close brace for your else branch.
(Unsure why you want to 'return' if the first file found that is not 'empty' branches to the return which doesn't do anything because return is only used to return from a function).
The exit is unnecessary and the return in the wanted function is unnessary.
Update: A File::Find::Rule solution could be used. Here is a small program that captures all files less than 14 bytes in my current directory and all of it's subdirectories.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use File::Find::Rule;
my $dir = '.';
my #files = find( file => size => "<14", in => $dir);
say -s $_, " $_" for #files;
This perl script is traversing all directories and sub directories, searching for a file named RUN in it. Then it opens the file and runs the 1st line written in the file. The problem is that I am not able to redirect the output of the system command to a file named error.log and STDERR to another file named test_file.errorlog, but no such file is created.
Note that all variable are declared if not found.
find (\&pickup_run,$path_to_search);
### Subroutine for extracting path of directories with RUN FILE PRESENT
sub pickup_run {
if ($File::Find::name =~/RUN/) {
### If RUN file is present , push it into array named run_file_present
push(#run_file_present,$File::Find::name);
}
}
###### Iterate over the array containing paths to directories containing RUN files one by one
foreach my $var (#run_file_present) {
$var =~ s/\//\\/g;
($path_minus_run=$var) =~ s/RUN\b//;
#print "$path_minus_run\n";
my $test_case_name;
($test_case_name=$path_minus_run) =~ s/expression to be replced//g;
chdir "$path_minus_run";
########While iterating over the paths, open each file
open data, "$var";
#####Run the first two lines containing commands
my #lines = <data>;
my $return_code=system (" $lines[0] >error.log 2>test_file.errorlog");
if($return_code) {
print "$test_case_name \t \t FAIL \n";
}
else {
print "$test_case_name \t \t PASS \n";
}
close (data);
}
The problem is almost certainly that $lines[0] has a newline at the end after being read from the file
But there are several improvements you could make
Always use strict and use warnings at the top of every Perl program, and declare all your variables using my as close as possible to their first point of use
Use the three-parameter form of open and always check whether it succeeded, putting the built-in variable $! into your die string to say why it failed. You can also use autodie to save writing the code for this manually for every open, but it requires Perl v5.10.1 or better
You shouldn't put quotes around scalar variables -- just used them as they are. so chdir $path_minus_run and open data, $var are correct
There is also no need to save all the files to be processed and deal with them later. Within the wanted subroutine, File::Find sets you up with $File::Find::dir set to the directory containing the file, and $_ set to the bare file name without a path. It also does a chdir to the directory for you, so the context is ideal for processing the file
use strict;
use warnings;
use v5.10.1;
use autodie;
use File::Find;
my $path_to_search;
find( \&pickup_run, $path_to_search );
sub pickup_run {
return unless -f and $_ eq 'RUN';
my $cmd = do {
open my $fh, '<', $_;
<$fh>;
};
chomp $cmd;
( my $test_name = $File::Find::dir ) =~ s/expression to be replaced//g;
my $retcode = system( "$cmd >error.log 2>test_file.errorlog" );
printf "%s\t\t%s\n", $test_name, $retcode ? 'FAIL' : 'PASS';
}