Is there a way of dieing in Perl in such a way that $# is not modified? As far as I can tell, die mangles the contents of $# under certain circumstances and "throws", transferring control non-locally. I just want to do the latter.
When die is a string appends the file name and line number to $# when the exceptionish object passed to die is a string (or undef).
For example,
#!/usr/bin/env perl
# foo.pl
use strict;
use warnings;
use Data::Dumper;
eval { die '1'; };
my $hash_ref = {
msg => $#,
};
print Dumper($hash_ref);
prints:
$VAR1 = {
'msg' => '1 at foo.pl line 7.
'
};
But if the argument is not a string (or undef) it isn't modified.
#!/usr/bin/env perl
# foo2.pl
use strict;
use warnings;
use Data::Dumper;
eval { die ['string-inside-arrayref']; };
my $hash_ref = {
msg => $#,
};
print Dumper($hash_ref);
This snippet produces:
$VAR1 = {
'msg' => [
'string-inside-arrayref'
]
};
perldoc -f die:
die LIST
die raises an exception. [...]
If the last element of LIST does not end in a newline, the current script line number and input line number (if any) are also printed, and a newline is supplied.
If you want to suppress the addition of file name and line number, make sure your error message ends with "\n".
To rethrow an existing exception, you can simply use die $#. Either $# is an exception object (then it won't get mangled anyway), or it is a string ending with "\n" (because the previous die will have made sure to add one if there wasn't one to begin with).
Related
Does STDOUT have a "type"?
printf STDERR ("STDOUT = %s\n", STDOUT);
printf STDERR ("\*STDOUT = %s\n", *STDOUT);
printf STDERR ("\\\*STDOUT = %s\n", \*STDOUT);
Produces:
STDOUT = STDOUT
*STDOUT = *main::STDOUT
\*STDOUT = GLOB(0x600078848)
I understand the *main::STDOUT and GLOB(0x600078848) entries. The "bareword" one leaves me curious.
I'm asking because I want to pass a file handle-like argument to a method call. In 'C', I'd use a file descriptor or a File *. I want it to default to STDOUT. What I've done is:
$OUT_FILE_HANDLE = \*STDOUT;
if(#ARGV > 0 ) {
open($OUT_FILE_HANDLE, ">", "$ARGV[0]") or die $!;
}
It works, but I don't know exactly what I've done. Have I botched up STDOUT? I suspect I have "ruined" (overwritten) STDOUT, which is NOT what I want.
Please pardon the compound question; they seemed related.
Create a lexical filehandle to be a copy of STDOUT and manipulate that as needed
sub manip_fh {
my ($fh) = #_;
say $fh "hi"; # goes to STDOUT
open my $fh, '>', 'a_file.txt' or die $!; # now it's to a file
say $fh "hello";
}
open my $fh, '>&', STDOUT; # via dup2
manip_fh($fh);
say "hi"; # still goes where STDOUT went before being dup-ed (terminal)
This new, independent, filehandle can then be reopened to another resource without affecting STDOUT. See open.
The $OUT_FILE_HANDLE = \*STDOUT; from the question creates an alias and so the STDOUT does indeed get changed when the "new" one changes. You can see that by printing the typeglob
our $NEW = \*STDOUT; # "our" only for checks here, otherwise better "my"
say *{$main::NEW}; #--> *main::STDOUT
or by printing the IO slot from the symbol table for both
say for *{$main::NEW}{IO}, *{$main::{STDOUT}}{IO};
and seeing (that the object stringifies to) the same (eg IO::File=IO(0x1a8ca50)).
When it's duped using open with mode >& as in the first code snippet (but as global our) it prints *main::NEW, and its IO::File object is not the same as for STDOUT. (Make it a global our so that it is in the symbol table for these checks, but not for real use; it's much better having a my.)
From perlvar:
Perl identifiers that begin with digits or punctuation characters are exempt from the effects of the package declaration and are always forced to be in package main; they are also exempt from strict 'vars' errors. A few other names are also exempt in these ways: [...] STDOUT
So, STDOUT is a global variable containing a pre-opened file handle.
From perlfunc:
If FILEHANDLE is an undefined scalar variable (or array or hash element), a new filehandle is autovivified, meaning that the variable is assigned a reference to a newly allocated anonymous filehandle. Otherwise if FILEHANDLE is an expression, its value is the real filehandle.
Your $OUT_FILE_HANDLE is not undefined, so it is its value, STDOUT, that is being opened. AFAIK, if you open an already open handle, it is implicitly closed first.
There are several ways to do what you want. The first is obvious from the above quote — do not define $OUT_FILE_HANDLE before the open:
if (#ARGV > 0 ) {
open($OUT_FILE_HANDLE, ">", "$ARGV[0]") or die $!;
} else {
$OUT_FILE_HANDLE = \*STDOUT;
}
# do stuff to $OUT_FILE_HANDLE
Another is to use select, so you don't need to pass a file handle:
if (#ARGV > 0 ) {
open($OUT_FILE_HANDLE, ">", "$ARGV[0]") or die $!;
select $OUT_FILE_HANDLE;
}
# do stuff (without specifying a file handle)
select STDOUT;
This part of your question wasn't answered:
The "bareword" one leaves me curious.
An identifier with no other meaning is a string literal that produces itself.[1] For example, foo is the same as 'foo'.
$ perl -e'my $x = foo; print "$x\n";'
foo
This is error-prone, so we use use strict qw( subs ); to prevent this.
$ perl -e'use strict; my $x = foo; print "$x\n";'
Bareword "foo" not allowed while "strict subs" in use at -e line 1.
Execution of -e aborted due to compilation errors.
See this for other meanings Perl could assign.
Is there a way to get the line number (and maybe filename) where a __DATA__ token was coded? Or some other way to know the actual line number in the original source file where a line of data read from the DATA filehandle came from?
Note that $. counts from 1 when reading from the DATA filehandle. So if the line number of the __DATA__ token were added to $. it would be what I'm looking for.
For example:
#!/usr/bin/perl
while (<DATA>) {
my $n = $. + WHAT??;
die "Invalid data at line $n\n" if /bad/;
}
__DATA__
something good
something bad
I want this to say "Invalid data at line 9", not "line 2" (which is what you get if $. is used by itself).
In systems that support /proc/<pid> virtual filesystems (e.g., Linux), you can do:
# find the file where <DATA> handle is read from
my $DATA_FILE = readlink("/proc/$$/fd/" . fileno(*DATA));
# find the line where DATA begins
open my $THIS, "<", $DATA_FILE;
my #THIS = <$THIS>;
my ($DATA_LINE) = grep { $THIS[$_] =~ /^__DATA__\b/ } 0 .. $#THIS;
File don't actually have lines; they're just sequences of bytes. The OS doesn't even offer the capability of getting a line from a file, so it has no concept of line numbers.
Perl, on the other hand, does keep track of a line number for each handle. It is accessed via $..
However, the Perl handle DATA is created from a file descriptor that's already been moved to the start of the data —it's the file descriptor that Perl itself uses to load and parse the file— so there's no record of how many lines have already been read. So the line 1 of DATA is the first line after __DATA__.
To correct the line count, one must seek back to the start of the file, and read it line by line until the file handle is back at the same position it started.
#!/usr/bin/perl
use strict;
use warnings qw( all );
use Fcntl qw( SEEK_SET );
# Determines the line number at the current file position without using «$.».
# Corrects the value of «$.» and returns the line number.
# Sets «$.» to «1» and returns «undef» if unable to determine the line number.
# The handle is left pointing to the same position as when this was called, or this dies.
sub fix_line_number {
my ($fh) = #_;
( my $initial_pos = tell($fh) ) >= 0
or return undef;
seek($fh, 0, SEEK_SET)
or return undef;
$. = 1;
while (<$fh>) {
( my $pos = tell($fh) ) >= 0
or last;
if ($pos >= $initial_pos) {
if ($pos > $initial_pos) {
seek($fh, $initial_pos, SEEK_SET)
or die("Can't reset handle: $!\n");
}
return $.;
}
}
seek($fh, $initial_pos, SEEK_SET)
or die("Can't reset handle: $!\n");
$. = 1;
return undef;
}
my $prefix = fix_line_number(\*DATA) ? "" : "+";
while (<DATA>) {
printf "%s:%s: %s", __FILE__, "$prefix$.", $_;
}
__DATA__
foo
bar
baz
Output:
$ ./a.pl
./a.pl:48: foo
./a.pl:49: bar
./a.pl:50: baz
$ perl <( cat a.pl )
/dev/fd/63:+1: foo
/dev/fd/63:+2: bar
/dev/fd/63:+3: baz
Perl keeps track of the file and line at which each symbol is created. A symbol is normally created when the parser/compiler first encounters it. But if __DATA__ is encountered before DATA is otherwise created, this will create the symbol. We can take advantage of this to set the line number associated with the file handle in DATA.
For the case where the Package::DATA handle is not used in Package.pm itself, the line number of the __DATA__ token could be obtained via B::GV->LINE on the DATA handle:
$ cat Foo.pm
package Foo;
1;
__DATA__
good
bad
$ perl -I. -MFoo -MB -e '
my $ln = B::svref_2object(\*Foo::DATA)->LINE;
warn "__DATA__ at line $ln\n";
Foo::DATA->input_line_number($ln);
while(<Foo::DATA>){ die "no good" unless /good/ }
'
__DATA__ at line 4
no good at -e line 1, <DATA> line 6.
In the case where the DATA handle is referenced in the file itself, a possible kludge would be to use an #INC hook:
$ cat DH.pm
package DH;
unshift #INC, sub {
my ($sub, $fname) = #_;
for(#INC){
if(open my $fh, '<', my $fpath = "$_/$fname"){
$INC{$fname} = $fpath;
return \'', $fh, sub {
our (%ln, %pos);
if($_){ $pos{$fname} += length; ++$ln{$fname} }
}
}
}
};
$ cat Bar.pm
package Bar;
print while <DATA>;
1;
__DATA__
good
bad
$ perl -I. -MDH -MBar -e '
my $fn = "Bar.pm";
warn "__DATA__ at line $DH::ln{$fn} pos $DH::pos{$fn}\n";
seek Bar::DATA, $DH::pos{$fn}, 0;
Bar::DATA->input_line_number($DH::ln{$fn});
while (<Bar::DATA>){ die "no good" unless /good/ }
'
good
bad
__DATA__ at line 6 pos 47
no good at -e line 6, <DATA> line 8.
Just for the sake of completion, in the case where you do have control over the file, all could be easily done with:
print "$.: $_" while <DATA>;
BEGIN { our $ln = __LINE__ + 1; DATA->input_line_number($ln) }
__DATA__
...
You can also use the first B::GV solution, provided that you reference the DATA handle via an eval:
use B;
my ($ln, $data) = eval q{B::svref_2object(\*DATA)->LINE, \*DATA}; die $# if $#;
$data->input_line_number($ln);
print "$.: $_" while <$data>;
__DATA__
...
None of these solutions assumes that the source file are seekable (except if you want to read the DATA more than once, as I did in the second example), or try to reparse your files, etc.
Comparing the end of the file to itself in reverse might do what you want:
#!/usr/bin/perl
open my $f, "<", $0;
my #lines;
my #dataLines;
push #lines ,$_ while <$f>;
close $f;
push #dataLines, $_ while <DATA>;
my #revLines= reverse #lines;
my #revDataLines=reverse #dataLines;
my $count=#lines;
my $offset=0;
$offset++ while ($revLines[$offset] eq $revDataLines[$offset]);
$count-=$offset;
print "__DATA__ section is at line $count\n";
__DATA__
Hello there
"Some other __DATA__
lkjasdlkjasdfklj
ljkasdf
Running give a output of :
__DATA__ section is at line 19
The above script reads itself (using $0 for file name) into the #lines array and reads the DATA file into the #dataLines array.
The arrays are reversed and then compared element by element until they are different. The number of lines are tracked in $offset and this is subtracted from the $count variable which is the number of lines in the file.
The result is the line number the DATA section starts at. Hope that helps.
Thank you #mosvy for the clever and general idea.
Below is a consolidated solution which works anywhere. It uses a symbolic reference instead of eval to avoid mentioning "DATA" at compile time, but otherwise uses the same ideas as mosvy.
The important point is that code in a package containing __DATA__ must not refer to the DATA symbol by name so that that symbol won't be created until the compiler sees the __DATA__ token. The way to avoid mentioning DATA is to use a filehandle ref created at run-time.
# Get the DATA filehandle for a package (default: the caller's),
# fixed so that "$." provides the actual line number in the
# original source file where the last-read line of data came
# from, rather than counting from 1.
#
# In scalar context, returns the fixed filehandle.
# In list context, returns ($fh, $filename)
#
# For this to work, a package containing __DATA__ must not
# explicitly refer to the DATA symbol by name, so that the
# DATA symbol (glob) will not yet be created when the compiler
# encounters the __DATA__ token.
#
# Therefore, use the filehandle ref returned by this
# function instead of DATA!
#
sub get_DATA_fh(;$) {
my $pkg = $_[0] // caller;
# Using a symbolic reference to avoid mentioning "DATA" at
# compile time, in case we are reading our own module's __DATA__
my $fh = do{ no strict 'refs'; *{"${pkg}::DATA"} };
use B;
$fh->input_line_number( B::svref_2object(\$fh)->LINE );
wantarray ? ($fh, B::svref_2object(\$fh)->FILE) : $fh
}
Usage examples:
my $fh = get_DATA_fh; # read my own __DATA__
while (<$fh>) { print "$. : $_"; }
or
my ($fh,$fname) = get_DATA_fh("Otherpackage");
while (<$fh>) {
print " $fname line $. : $_";
}
i am newbie to perl. and this is my second assignment i should create program to parse n files and print m sentences using n-grams model. long story short, i wrote this script that will take n arguments, where the first and second arguments are numeric but the rest are files names, however i am getting this error Wide character in print at ngram.pl line 35, line 1.
steps to reproduce it :
input from command line : perl ngram.pl 5 10 tale-cities.txt bleak-house.txt papers.txt
output : Wide character in print at ngram.pl line 35, line 1.
#!/usr/bin/perl
use strict;
use warnings FATAL => 'all';
use Scalar::Util qw(looks_like_number);
use utf8;
use Encode;
#Charles Dickens
sub checkIfNumberic
{
my ($inp)=#_;
if (looks_like_number($inp)){
return "True";
}
else{
return "False" ;
}
}
sub main
{
my $correctInput=", your input must be something like this 5 10 somefile.txt somefile2.txt ";
my #inputs= #ARGV;
if (checkIfNumberic($inputs[0]) eq "False"){
die "first argument must be numberic $correctInput\n";
}
if (checkIfNumberic($inputs[1]) eq "False"){
die "second argument must be numberic $correctInput\n";
}
for (my $i=2; $i< scalar #inputs ;$i++)
{
if (open(my $fh, '<:encoding(UTF-8)', $inputs[$i])) {
while (my $line = <$fh>) {
chomp $line;
print "$line \n";
}
}
}
}
main();
You decoded your inputs (the script, with use utf8;; and the file, with :encoding(UTF-8)), but you didn't encode your outputs. Add
use open ':std', ':encoding(UTF-8)';
This is equivalent to
BEGIN {
binmode STDIN, ':encoding(UTF-8)';
binmode STDOUT, ':encoding(UTF-8)';
binmode STDERR, ':encoding(UTF-8)';
}
It also sets the default encoding for file handles opened in its lexical scope, you can remove the existing :encoding(UTF-8) if you want.
I am using reval from Perl's Safe module and I want to prevent it from generating warnings if the string being eval'ed can't be parsed (actually, I want to prevent it from generating any warnings at all).
For example, the following code:
use strict; use warnings;
use Safe;
use feature qw/say/;
my $cft = Safe->new;
my $x = $cft->reval(') 1' );
my $y = $cft->reval('2' );
say "x: $x";
say "y: $y";
results in:
Number found where operator expected at (eval 5) line 1, near ") 1"
(Missing operator before 1?)
Use of uninitialized value $x in concatenation (.) or string at ./test line 12.
x:
y: 2
What I'm trying to achieve is to have $x = undef and $y = 2, and no warnings.
I tried to put a "no warnings;" inside a new scope, but it has no effect on the warnings produced from within the reval (although, as pointed out by #DavidO, it silences the 'uninitialized value' warning):
use strict; use warnings;
use Safe;
use feature qw/say/;
my $cft = Safe->new;
{
no warnings;
my $x = $cft->reval(') 1' );
my $y = $cft->reval('2' );
say "x: $x";
say "y: $y";
}
I guess that somehow the 'no warnings' has to be inside the Safe compartment, so I also tried to prepend "no warnings;" to the strings being eval'ed:
use strict; use warnings;
use Safe;
use feature qw/say/;
my $cft = Safe->new;
{
my $x = $cft->reval( 'no warnings;' . ') 1' );
my $y = $cft->reval( 'no warnings;' . '2' );
say "x: $x";
say "y: $y";
}
This way reval does not issue any warnings, but both variables are undef:
Use of uninitialized value $x in concatenation (.) or string at ./test line 10.
x:
Use of uninitialized value $y in concatenation (.) or string at ./test line 11.
y:
I don't know what else to try, and I hope that the problem description was clear enough.
If you check $# you'll see that $cft->reval( 'no warnings;' . ') 1' ); failed. 'require' trapped by operation mask at (eval 5) line 1.. In other words, Safe is doing its job and preventing that code from trying to load a library.
$cft->reval( 'BEGIN { warnings->unimport; } ) 1' ); would work, presuming warnings is already loaded outside the compartment. However, that won't quiet compile time errors. Unlike eval, reval seems to let them through. Use amon's technique of quieting STDERR.
no warnings suppresses all the warnings the use warnings pragma generates. You would probably want to remove any strictures as well. But severe parsing errors will pop up any way.
If you want to execute any code, no matter how pathological, without any output to STDERR, you should locally modify the signal handler:
{
# I know what I'm doing!
local $SIG{__WARN__} = sub {}; # locally ignore any warnings
eval $code; # catches all "die"
}
or we could reopen STDERR to /dev/null:
{
# I know what I'm doing!
open my $oldSTDERR, '>&' \*STDERR or die;
close STDERR or die;
open STDERR, '>', '/dev/null' or die;
eval $code;
close STDERR or die;
open STDERR, '>&', $oldSTDERR or die;
close $oldSTDERR;
}
I have some code that looks like
my ($ids,$nIds);
while (<myFile>){
chomp;
$ids.= $_ . " ";
$nIds++;
}
This should concatenate every line in my myFile, and nIds should be my number of lines. How do I print out my $ids and $nIds?
I tried simply print $ids, but Perl complains.
my ($ids, $nIds)
is a list, right? With two elements?
print "Number of lines: $nids\n";
print "Content: $ids\n";
How did Perl complain? print $ids should work, though you probably want a newline at the end, either explicitly with print as above or implicitly by using say or -l/$\.
If you want to interpolate a variable in a string and have something immediately after it that would looks like part of the variable but isn't, enclose the variable name in {}:
print "foo${ids}bar";
You should always include all relevant code when asking a question. In this case, the print statement that is the center of your question. The print statement is probably the most crucial piece of information. The second most crucial piece of information is the error, which you also did not include. Next time, include both of those.
print $ids should be a fairly hard statement to mess up, but it is possible. Possible reasons:
$ids is undefined. Gives the warning undefined value in print
$ids is out of scope. With use
strict, gives fatal warning Global
variable $ids needs explicit package
name, and otherwise the undefined
warning from above.
You forgot a semi-colon at the end of
the line.
You tried to do print $ids $nIds,
in which case perl thinks that $ids
is supposed to be a filehandle, and
you get an error such as print to
unopened filehandle.
Explanations
1: Should not happen. It might happen if you do something like this (assuming you are not using strict):
my $var;
while (<>) {
$Var .= $_;
}
print $var;
Gives the warning for undefined value, because $Var and $var are two different variables.
2: Might happen, if you do something like this:
if ($something) {
my $var = "something happened!";
}
print $var;
my declares the variable inside the current block. Outside the block, it is out of scope.
3: Simple enough, common mistake, easily fixed. Easier to spot with use warnings.
4: Also a common mistake. There are a number of ways to correctly print two variables in the same print statement:
print "$var1 $var2"; # concatenation inside a double quoted string
print $var1 . $var2; # concatenation
print $var1, $var2; # supplying print with a list of args
Lastly, some perl magic tips for you:
use strict;
use warnings;
# open with explicit direction '<', check the return value
# to make sure open succeeded. Using a lexical filehandle.
open my $fh, '<', 'file.txt' or die $!;
# read the whole file into an array and
# chomp all the lines at once
chomp(my #file = <$fh>);
close $fh;
my $ids = join(' ', #file);
my $nIds = scalar #file;
print "Number of lines: $nIds\n";
print "Text:\n$ids\n";
Reading the whole file into an array is suitable for small files only, otherwise it uses a lot of memory. Usually, line-by-line is preferred.
Variations:
print "#file" is equivalent to
$ids = join(' ',#file); print $ids;
$#file will return the last index
in #file. Since arrays usually start at 0,
$#file + 1 is equivalent to scalar #file.
You can also do:
my $ids;
do {
local $/;
$ids = <$fh>;
}
By temporarily "turning off" $/, the input record separator, i.e. newline, you will make <$fh> return the entire file. What <$fh> really does is read until it finds $/, then return that string. Note that this will preserve the newlines in $ids.
Line-by-line solution:
open my $fh, '<', 'file.txt' or die $!; # btw, $! contains the most recent error
my $ids;
while (<$fh>) {
chomp;
$ids .= "$_ "; # concatenate with string
}
my $nIds = $.; # $. is Current line number for the last filehandle accessed.
How do I print out my $ids and $nIds?
print "$ids\n";
print "$nIds\n";
I tried simply print $ids, but Perl complains.
Complains about what? Uninitialised value? Perhaps your loop was never entered due to an error opening the file. Be sure to check if open returned an error, and make sure you are using use strict; use warnings;.
my ($ids, $nIds) is a list, right? With two elements?
It's a (very special) function call. $ids,$nIds is a list with two elements.