Perl HTML::parser error; Undefined subroutine &main::1 - perl

I am getting the error
Undefined subroutine &main::1 called at /usr/local/lib/perl/5.10.0/HTML/Parser.pm line 102.
Here is my code
#open (IN, "<", "foo.html") or die "can't open source file: $!";
my $p = HTML::Parser->new( api_version => 3,
start_h => [&start, "tagname, attr, text"],
text_h => [&text, "text"],
default_h => [sub { print OUT shift }, "text"],
);
$p->utf8_mode;
$p->empty_element_tags;
$p->ignore_elements(qw(br));
$p->parse_file("foo.html") or die "parsing failed: $!";
#while (<IN>) {
# $p->parse($_) || die "parsing failed: $!";
#}
#$p->eof;
#close IN;
As you can see in the commented out parts I have also tried directly opening and calling parse (with equally little luck).
The file does open fine.
Parser.pm line 102 which is error mentions is the parse_file subroutine, specifically the line calling ->parse
I have no clue where parse is, it is not in HTML::Parser nor did I find it in HTML::Entities the only dependency HTML::Parser has. =/ I am afraid I am lost at this point, the deepest magics of PERL are still a mystery to me.

Try using \&start and \&text:
my $p = HTML::Parser->new( api_version => 3,
start_h => [\&start, "tagname, attr, text"],
text_h => [\&text, "text"],
default_h => [sub { print OUT shift }, "text"],
);
Otherwise you are passing the result of calling start() and text(), not references to them as subs.

In the documentation it says you should use \&start. If you exclude the backslash, it will be using the return value from the function start instead (which will be using the #_ as argument list, as per normal subroutine calling pragma using the &). This value could be 1.
Here is an example:
C:\perl>perl -we "$c=\&s; sub s { print 'yada' }; $c->();"
yada
C:\perl>perl -we "$c=&s; sub s { print 'yada' }; $c->();"
Undefined subroutine &main::1 called at -e line 1.
yada
Not sure why the error turns up there, but you might change it, see if it helps.
Oh, also, it does seem like you are not using use strict. When using strict, I get a much more helpful error:
C:\perl>perl -we "use strict; my $c=&s; sub s { print 'a' }; $c->();"
Can't use string ("1") as a subroutine ref while "strict refs" in use at -e line

Related

Perl: Signal handler: Can't use an undefined value as a symbol reference

I have the following:
#! /usr/bin/perl
use Fcntl ':flock';
use strict;
...
my (#list, $fh);
open $fh, "<:encoding(utf8)", $file or die "$file: $!";
flock $fh, LOCK_EX;
use sigtrap 'handler' => sub {flock($fh, LOCK_UN);}, 'normal-signals'; # line 72
It runs but gives an error when I press ^C:
$ verify.pl
...
Can't use an undefined value as a symbol reference at ./verify.pl line 72.
The docs says it's OK
$ perldoc sigtrap
...
use sigtrap 'handler' => \&my_handler, 'normal-signals';
I can't figure out what's wrong.
I'm going to suggest that $fh is the undefined symbol reference. The use is run during the compilation phase (as if in a BEGIN {...} block), and, depending on where you think you're calling it, the $fh may never have been opened, or its scope may not be what you think.
Add some debugging to your handler to show, for example, the value of $fh and the refaddr of $fh. Also add to your open/flock code the same. I bet the refaddr isn't the same.
Since the sigtrap is registered globally, you may be best off with a global, e.g.:
my #handlers;
use sigtrap handler => sub { $_->() for grep defined, #handlers }, 'normal-signals';
{
my ($fh, #list);
open my $fh, ...
flock $fh, LOCK_EX;
my $handler = sub { flock $fh, LOCK_UN };
#handlers = map { weaken $_ } grep defined, #handlers, $handler;
# ... do stuff. When $handler goes out of scope, it'll go undef in #handlers
}

defining a file handle attribute in moose

I am trying to define a file handle attribute in my Perl code using moose as follows:
has validation_log_file_handler => (
is => 'ro', isa => 'File', builder => '_build_validation_log_file_handler'
);
The builder:
sub _build_validation_log_file_handler {
my ($self) = #_;
open(my $fh, ">", $self->validation_log_file)
or die ("ERROR:Can't open file "
. $self->validation_log_file
. " for writing");
return $fh;
}
But when trying to write to a file:
sub run {
my ($self) = #_;
print $self->validation_log_file_handler "Hello\n";
.
.
.
}
I new at Moose. Am I doing something wrong? I get the following compilation error:
syntax error. String found where operator expected
Printing to complex filehandles requires curlies:
print { $self->validation_log_file_handler } "Hello\n";
or you could use the OO notation
use IO::Handle; # Required in older versions of Perl
$self->validation_log_file_handler->print("Hello\n");
Did you define a File class? If not, use IO::Handle as the isa.

Can I get a handle to - source?

It looks like there is a symbol in main called '_<-' (without the quotes) in the same fashion as the other things that look like they could be handles: '_</usr/perl/lib/Carp.pm', for example.
Is there some way to use it?
Or would I have to use a source filter if I hope to read the input source?
In reply to mob: I don't know where Debug would be getting turned on. After I dump out the base table, a dump of %INC shows:
$VAR1 = {
'warnings/register.pm' => 'C:/strawberry/perl/lib/warnings/register.pm',
'XSLoader.pm' => 'C:/strawberry/perl/lib/XSLoader.pm',
'English.pm' => 'C:/strawberry/perl/lib/English.pm',
'Tie/Hash/NamedCapture.pm' => 'C:/strawberry/perl/lib/Tie/Hash/NamedCapture.pm',
'unicore/lib/Perl/_PerlIDS.pl' => 'C:/strawberry/perl/lib/unicore/lib/Perl/_PerlIDS.pl',
'unicore/Heavy.pl' => 'C:/strawberry/perl/lib/unicore/Heavy.pl',
'warnings.pm' => 'C:/strawberry/perl/lib/warnings.pm',
'utf8.pm' => 'C:/strawberry/perl/lib/utf8.pm',
'Config.pm' => 'C:/strawberry/perl/lib/Config.pm',
'overloading.pm' => 'C:/strawberry/perl/lib/overloading.pm',
'Symbol.pm' => 'C:/strawberry/perl/lib/Symbol.pm',
'Carp.pm' => 'C:/strawberry/perl/lib/Carp.pm',
'bytes.pm' => 'C:/strawberry/perl/lib/bytes.pm',
'Exporter/Heavy.pm' => 'C:/strawberry/perl/lib/Exporter/Heavy.pm',
'utf8_heavy.pl' => 'C:/strawberry/perl/lib/utf8_heavy.pl',
'strict.pm' => 'C:/strawberry/perl/lib/strict.pm',
'Exporter.pm' => 'C:/strawberry/perl/lib/Exporter.pm',
'vars.pm' => 'C:/strawberry/perl/lib/vars.pm',
'constant.pm' => 'C:/strawberry/perl/lib/constant.pm',
'Errno.pm' => 'C:/strawberry/perl/lib/Errno.pm',
'overload.pm' => 'C:/strawberry/perl/lib/overload.pm',
'Data/Dumper.pm' => 'C:/strawberry/perl/lib/Data/Dumper.pm'
};
Or would I have to use a source filter if I hope to read the input source?
If the source file has an __END__ or __DATA__ tag, then the DATA filehandle is available. ...that in and of itself is boring. What's interesting is that you can seek to position 0, and that will take you to the top of the source file:
use Carp;
print "Just another Perl hacker,\n";
eval {
no warnings qw/unopened/;
seek DATA, 0, 0
or croak "Script lacking __END__ or __DATA__ tag has no DATA filehandle.";
};
if( !$# ) {
while(<DATA>){
print;
}
}
else {
carp $#;
}
__END__
This script will execute (printing 'Just another Perl hacker,'), and then will finish up by printing its own source.
In the code above, if the eval block does trap an exception, the fallback could be to use FindBin and $0, open the source file, and then read it. Putting it all together, here's how it looks:
BEGIN {
use Carp;
sub read_source {
my $source;
local $/ = undef;
eval {
no warnings qw( unopened );
my $DATA_position = tell DATA;
croak "'tell DATA' failed: Probably no __END__ or __DATA__ segment."
if $DATA_position < 0;
seek DATA, 0, 0
or croak
"'seek DATA' failed: Probably no __END__ or __DATA__ segment.";
$source = <DATA>;
seek DATA, $DATA_position, 0 or croak # Must leave *DATA usable.
"seek to reset DATA filehandle failed after read.";
};
if ($#) {
croak $# if $# =~ /reset/; # Unstable state: Shouldn't be possible.
eval {
require FindBin;
no warnings 'once';
open my $source_fh, $FindBin::Bin . '/' . $0 or croak $!;
$source = <$source_fh>;
};
croak "Couldn't read source file from *DATA or \$0: $#" if $#;
}
return $source;
}
};
print read_source(), "\n";
This snippet first tries to read from DATA, which eliminates the need to load FindBin and open a new file handle. If that fails, then it tries the FindBin approach. If both fail, it throws an exception. The final successful state slurps the entire source file into $source_code. The DATA handle will also be restored to the same state it was in before calling this snippet.
That should robustly handle the question of how to read the source file without resorting to a source filter.
You are seeing this in the perl debugger? That is likely where those symbol table entries come from: see the DATA STRUCTURES MAINTAINED BY CORE section of the perldoc in your perl5db.pl file.
The only way I can see to get the _<- entry in the symbol table is to start perl with just the -d switch and then enter a Perl program into standard input, e.g.:
$ perl -d
Loading DB routines from perl5db.pl version 1.32
Editor support available.
Enter h or `h h' for help, or `man perldebug' for more help.
print "Hello world\n";
<Ctrl-D>
main::(-:1): print "Hello world\n";
DB<1>
From here, #{"_<-"} (or #{$main::{"_<-"}}) contains your input, ${"_<-"} or ${$main::{"_<-"}} contains the "name" of your file (just -), and %{"_<-"}/%{$main::{"_<-"}} holds information about breakpoints and actions for stepping through code from the standard input.
Without strict refs, you could also view this data with something like
DB<6> $name="_<-"
DB<7> p ${$name}
-
DB<8> p #{$name}
BEGIN { require 'perl5db.pl' };
print "Hello world\n";
DB<9> p %{$name}
There is no filehandle associated with the symbol table entry for _<- (or for any other _<... symbols).

How to pass a file handle to a function?

When I run the code below I get
Can't use string ("F") as a symbol ref while "strict refs" in use at ./T.pl line 21.
where line 21 is
flock($fh, LOCK_EX);
What am I doing wrong?
#!/usr/bin/perl
use strict;
use warnings;
use Fcntl ':flock', 'SEEK_SET'; # file locking
use Data::Dumper;
# use xx;
my $file = "T.yaml";
my $fh = "F";
my $obj = open_yaml_with_lock($file, $fh);
$obj->{a} = 1;
write_yaml_with_lock($obj, $fh);
sub open_yaml_with_lock {
my ($file, $fh) = #_;
open $fh, '+<', $file;
flock($fh, LOCK_EX);
my $obj = YAML::Syck::LoadFile($fh);
return $obj;
}
sub write_yaml_with_lock {
my ($obj, $fh) = #_;
my $yaml = YAML::Syck::Dump($obj);
$YAML::Syck::ImplicitUnicode = 1;
seek $fh,0, SEEK_SET; # seek back to the beginning of file
print $fh $yaml . "---\n";
close $fh;
}
What you're doing wrong is using the string "F" as a filehandle. This
has never been something that's worked; you could use a bareword as a
filehandle (open FH, ...; print FH ...), or you could pass in an
empty scalar and perl would assign a new open file object to that
variable. But if you pass in the string F, then you need to refer to
then handle as F, not $fh. But, don't do that.
Do this instead:
sub open_yaml_with_lock {
my ($file) = #_;
open my $fh, '+<', $file or die $!;
flock($fh, LOCK_EX) or die $!;
my $obj = YAML::Syck::LoadFile($fh); # this dies on failure
return ($obj, $fh);
}
We're doing several things here. One, we're not storing the
filehandle in a global. Global state makes your program extremely
difficult to understand -- I had a hard time with your 10 line post --
and should be avoided. Just return the filehandle, if you want to
keep it around. Or, you can alias it like open does:
sub open_yaml_with_lock {
open $_[0], '+<', $_[1] or die $!;
...
}
open_yaml_with_lock(my $fh, 'filename');
write_yaml_with_lock($fh);
But really, this is a mess. Put this stuff in an object. Make new
open and lock the file. Add a write method. Done. Now you can
reuse this code (and let others do the same) without worrying about
getting something wrong. Less stress.
The other thing we're doing here is checking errors. Yup, disks can
fail. Files can be typo'd. If you blissfully ignore the return value
of open and flock, then your program may not be doing what you think
it's doing. The file might not be opened. The file might not be
locked properly. One day, your program is not going to work properly
because you spelled "file" as "flie" and the file can't be opened.
You will scratch your head for hours wondering what's going on.
Eventually, you'll give up, go home, and try again later. This time,
you won't typo the file name, and it will work. Several hours will
have been wasted. You'll die several years earlier than you should
because of the accumulated stress. So just use autodie or write or
die $! after your system calls so that you get an error message when
something goes wrong!
Your script would be correct if you wrote use autodie qw/open flock
seek close/ at the top. (Actually, you should also check that
"print" worked or use
File::Slurp or
syswrite, since autodie can't detect a failing print statement.)
So anyway, to summarize:
Don't open $fh when $fh is defined. Write open my $fh to
avoid thinking about this.
Always check the return values of system calls. Make autodie do
this for you.
Don't keep global state. Don't write a bunch of functions that
are meant to be used together but rely on implicit preconditions
like an open file. If functions have preconditions, put them in
a class and make the constructor satisfy the preconditions.
This way, you can't accidentally write buggy code!
Update
OK, here's how to make this more OO. First we'll do "pure Perl" OO
and then use Moose. Moose is
what I would use for any real work; the "pure Perl" is just for the
sake of making it easy to understand for someone new to both OO and
Perl.
package LockedYAML;
use strict;
use warnings;
use Fcntl ':flock', 'SEEK_SET';
use YAML::Syck;
use autodie qw/open flock sysseek syswrite/;
sub new {
my ($class, $filename) = #_;
open my $fh, '+<', $filename;
flock $fh, LOCK_EX;
my $self = { obj => YAML::Syck::LoadFile($fh), fh => $fh };
bless $self, $class;
return $self;
}
sub object { $_[0]->{obj} }
sub write {
my ($self, $obj) = #_;
my $yaml = YAML::Syck::Dump($obj);
local $YAML::Syck::ImplicitUnicode = 1; # ensure that this is
# set for us only
my $fh = $self->{fh};
# use system seek/write to ensure this really does what we
# mean. optional.
sysseek $fh, 0, SEEK_SET;
syswrite $fh, $yaml;
$self->{obj} = $obj; # to keep things consistent
}
Then, we can use the class in our main program:
use LockedYAML;
my $resource = LockedYAML->new('filename');
print "Our object looks like: ". Dumper($resource->object);
$resource->write({ new => 'stuff' });
Errors will throw exceptions, which can be handled with
Try::Tiny, and the YAML
file will stay locked as long as the instance exists. You can, of
course, have many LockedYAML objects around at once, that's why we
made it OO.
And finally, the Moose version:
package LockedYAML;
use Moose;
use autodie qw/flock sysseek syswrite/;
use MooseX::Types::Path::Class qw(File);
has 'file' => (
is => 'ro',
isa => File,
handles => ['open'],
required => 1,
coerce => 1,
);
has 'fh' => (
is => 'ro',
isa => 'GlobRef',
lazy_build => 1,
);
has 'obj' => (
is => 'rw',
isa => 'HashRef', # or ArrayRef or ArrayRef|HashRef, or whatever
lazy_build => 1,
trigger => sub { shift->_update_obj(#_) },
);
sub _build_fh {
my $self = shift;
my $fh = $self->open('rw');
flock $fh, LOCK_EX;
return $fh;
}
sub _build_obj {
my $self = shift;
return YAML::Syck::LoadFile($self->fh);
}
sub _update_obj {
my ($self, $new, $old) = #_;
return unless $old; # only run if we are replacing something
my $yaml = YAML::Syck::Dump($new);
local $YAML::Syck::ImplicitUnicode = 1;
my $fh = $self->fh;
sysseek $fh, 0, SEEK_SET;
syswrite $fh, $yaml;
return;
}
This is used similarly:
use LockedYAML;
my $resource = LockedYAML->new( file => 'filename' );
$resource->obj; # the object
$resource->obj( { new => 'object' }); # automatically saved to disk
The Moose version is longer, but does a lot more runtime consistency
checking and is easier to enhance. YMMV.
From the documentation:
open FILEHANDLE,EXPR
If FILEHANDLE is an undefined scalar variable (or array or hash
element) the variable is assigned a reference to a new anonymous
filehandle, otherwise if FILEHANDLE is an expression, its value is
used as the name of the real filehandle wanted. (This is considered a
symbolic reference, so "use strict 'refs'" should
not be in effect.)
Filehandle here is an expression ("F") so itsvalue is used as the name of the real filehandle you want. (A filehandle called F). And then... the documentation says "use strict 'refs'" should not be in effect, because you're using F as a symbolic reference.
(use strict; on line 1 includes strict 'refs'.)
Had you just said at the beginning:
my $fh;
This would have worked, because then $fh would become a reference to a new anonymous filehandle and Perl won't try to use it as a symbolic reference.
This works:
#!/usr/bin/perl
my $global_fh;
open_filehandle(\$global_fh);
use_filehandle(\$global_fh);
sub open_filehandle {
my ($fh)=#_;
open($$fh, ">c:\\temp\\testfile") || die;
}
sub use_filehandle {
my($fh) = #_;
# Print is pecular that it expects the next token to be the filehandle
# or a simple scalar. Thus, print $$fh "Hello, world!" will not work.
my $lfh = $$fh;
print $lfh "Hello, world!";
close($$fh);
}
Or you can do what the other poster suggested and use $_[1] directly, but that's a bit harder to read.
If you use the value directly in the sub, it will work:
use strict;
use warnings;
use autodie;
my $fh;
yada($fh);
print $fh "testing, testing";
sub yada {
open $_[0], '>', 'yada.gg';
}
Or as a reference:
yada(\$fh);
sub yada {
my $handle = shift;
open $$handle, '>', 'yada.gg';
}
Or better yet, return a filehandle:
my $fh = yada($file);
sub yada {
my $inputfile = shift;
open my $gg, '>', $inputfile;
return $gg;
}
Replace
my $fh = "F"; # text and also a ref in nonstrict mode
with
my $fh = \*F; # a reference, period
Of course, it's better yet to use lexical filehandles, as in open my $fd, ... or die ..., but that's not always possible, e.g. you have STDIN that's predefined. In such cases, use \*FD wherever $fd fits.
There's also a case with old scripts, you have to watch out where a global FD is opened and closed.

Perl: How to pass and use a lexical file handle to a subroutine as a named argument?

I want to pass a lexical file handle to a subroutine using a named argument, but the following does not compile:
#!/usr/bin/perl -w
use strict;
my $log_fh;
my $logname = "my.log";
sub primitive {
my ($fh, $m) = #_;
print $fh $m;
}
sub sophisticated {
my ($args) = #_;
print $args->{m};
print $args->{fh} $args->{m} ;
}
open $log_fh, ">", $logname;
print $log_fh "Today I learned ...\n";
primitive($log_fh,"... the old way works ...\n");
sophisticated({
fh=>$log_fh,
m=>"... and the new way requires an intervention by SO.",
});
close $log_fh;
The complaint is:
Scalar found where operator expected at ./lexical.file.handle.pl line 15, near
} $args"
(Missing operator before $args?)
$ perl --version
This is perl, v5.10.1
It works O.K. when I use the primitive technique of passing arguments, and the named-argument hash technique works for the message portion, just not for the file handle portion. Do I need a new version of print ?
When you've got a complex expression that returns a filehandle (like $args->{fh}) you'll need to disambiguate the syntax a bit by adding some extra curlies:
print { $args->{fh} } $args->{m};
This is due to the weird way the print operator is designed, with no comma between the filehandle and the list of stuff to print.
Alternatively, you could grab the filehandle out of your arguments hashref first, e.g.
my $fh = $args->{fh};
print $fh $args->{m};
friedo's answer covers your problem, but there's a stylistic issue I'd like to point out. You don't need to wrap everything in an anonymous hash to emulate named arguments. A hash initializer is just a list interpreted as key/value pairs. Passing such a list to a sub provides a cleaner syntax for the caller:
sub sophisticated {
my %arg = #_;
print $arg{m};
print {$arg{fh}} $arg{m};
}
sophisticated(fh => $log_fh, m => "Hello, world!\n");