Can I get a handle to - source? - perl

It looks like there is a symbol in main called '_<-' (without the quotes) in the same fashion as the other things that look like they could be handles: '_</usr/perl/lib/Carp.pm', for example.
Is there some way to use it?
Or would I have to use a source filter if I hope to read the input source?
In reply to mob: I don't know where Debug would be getting turned on. After I dump out the base table, a dump of %INC shows:
$VAR1 = {
'warnings/register.pm' => 'C:/strawberry/perl/lib/warnings/register.pm',
'XSLoader.pm' => 'C:/strawberry/perl/lib/XSLoader.pm',
'English.pm' => 'C:/strawberry/perl/lib/English.pm',
'Tie/Hash/NamedCapture.pm' => 'C:/strawberry/perl/lib/Tie/Hash/NamedCapture.pm',
'unicore/lib/Perl/_PerlIDS.pl' => 'C:/strawberry/perl/lib/unicore/lib/Perl/_PerlIDS.pl',
'unicore/Heavy.pl' => 'C:/strawberry/perl/lib/unicore/Heavy.pl',
'warnings.pm' => 'C:/strawberry/perl/lib/warnings.pm',
'utf8.pm' => 'C:/strawberry/perl/lib/utf8.pm',
'Config.pm' => 'C:/strawberry/perl/lib/Config.pm',
'overloading.pm' => 'C:/strawberry/perl/lib/overloading.pm',
'Symbol.pm' => 'C:/strawberry/perl/lib/Symbol.pm',
'Carp.pm' => 'C:/strawberry/perl/lib/Carp.pm',
'bytes.pm' => 'C:/strawberry/perl/lib/bytes.pm',
'Exporter/Heavy.pm' => 'C:/strawberry/perl/lib/Exporter/Heavy.pm',
'utf8_heavy.pl' => 'C:/strawberry/perl/lib/utf8_heavy.pl',
'strict.pm' => 'C:/strawberry/perl/lib/strict.pm',
'Exporter.pm' => 'C:/strawberry/perl/lib/Exporter.pm',
'vars.pm' => 'C:/strawberry/perl/lib/vars.pm',
'constant.pm' => 'C:/strawberry/perl/lib/constant.pm',
'Errno.pm' => 'C:/strawberry/perl/lib/Errno.pm',
'overload.pm' => 'C:/strawberry/perl/lib/overload.pm',
'Data/Dumper.pm' => 'C:/strawberry/perl/lib/Data/Dumper.pm'
};

Or would I have to use a source filter if I hope to read the input source?
If the source file has an __END__ or __DATA__ tag, then the DATA filehandle is available. ...that in and of itself is boring. What's interesting is that you can seek to position 0, and that will take you to the top of the source file:
use Carp;
print "Just another Perl hacker,\n";
eval {
no warnings qw/unopened/;
seek DATA, 0, 0
or croak "Script lacking __END__ or __DATA__ tag has no DATA filehandle.";
};
if( !$# ) {
while(<DATA>){
print;
}
}
else {
carp $#;
}
__END__
This script will execute (printing 'Just another Perl hacker,'), and then will finish up by printing its own source.
In the code above, if the eval block does trap an exception, the fallback could be to use FindBin and $0, open the source file, and then read it. Putting it all together, here's how it looks:
BEGIN {
use Carp;
sub read_source {
my $source;
local $/ = undef;
eval {
no warnings qw( unopened );
my $DATA_position = tell DATA;
croak "'tell DATA' failed: Probably no __END__ or __DATA__ segment."
if $DATA_position < 0;
seek DATA, 0, 0
or croak
"'seek DATA' failed: Probably no __END__ or __DATA__ segment.";
$source = <DATA>;
seek DATA, $DATA_position, 0 or croak # Must leave *DATA usable.
"seek to reset DATA filehandle failed after read.";
};
if ($#) {
croak $# if $# =~ /reset/; # Unstable state: Shouldn't be possible.
eval {
require FindBin;
no warnings 'once';
open my $source_fh, $FindBin::Bin . '/' . $0 or croak $!;
$source = <$source_fh>;
};
croak "Couldn't read source file from *DATA or \$0: $#" if $#;
}
return $source;
}
};
print read_source(), "\n";
This snippet first tries to read from DATA, which eliminates the need to load FindBin and open a new file handle. If that fails, then it tries the FindBin approach. If both fail, it throws an exception. The final successful state slurps the entire source file into $source_code. The DATA handle will also be restored to the same state it was in before calling this snippet.
That should robustly handle the question of how to read the source file without resorting to a source filter.

You are seeing this in the perl debugger? That is likely where those symbol table entries come from: see the DATA STRUCTURES MAINTAINED BY CORE section of the perldoc in your perl5db.pl file.
The only way I can see to get the _<- entry in the symbol table is to start perl with just the -d switch and then enter a Perl program into standard input, e.g.:
$ perl -d
Loading DB routines from perl5db.pl version 1.32
Editor support available.
Enter h or `h h' for help, or `man perldebug' for more help.
print "Hello world\n";
<Ctrl-D>
main::(-:1): print "Hello world\n";
DB<1>
From here, #{"_<-"} (or #{$main::{"_<-"}}) contains your input, ${"_<-"} or ${$main::{"_<-"}} contains the "name" of your file (just -), and %{"_<-"}/%{$main::{"_<-"}} holds information about breakpoints and actions for stepping through code from the standard input.
Without strict refs, you could also view this data with something like
DB<6> $name="_<-"
DB<7> p ${$name}
-
DB<8> p #{$name}
BEGIN { require 'perl5db.pl' };
print "Hello world\n";
DB<9> p %{$name}
There is no filehandle associated with the symbol table entry for _<- (or for any other _<... symbols).

Related

perl - use command-line argument multiple times

I'm modifying a perl script in which the command line arguments are parsed like this:
if ($arg eq "-var1") {
$main::variable1 = shift(#arguments)
} elsif ($arg eq "-var2") {
$main::variable2 = shift(#arguments)
} elsif ($arg eq "var3") {
$main::variable3 = shift(#arguments)
} ...
So there is a whole bunch of elsif statements to cover all command-line arguments.
I'm now in a situaton where I want to use the argument '-var2' multiple times.
So my main::variable2 should maybe be an array that contains all values that are passed with "-var2".
I found that with Perl::getopt, this can be easily achieved (Perl Getopt Using Same Option Multiple Times).
However the way that my script parses its command-line arguments is different. So I was wondering if it could be achieved, without having to change the way the arguments are parsed.
That's not your actual code, is it? It won't even compile.
I'd be really surprised if Getopt::Long can't solve your problem and it's really a better idea to use a library rather than writing your own code.
But changing your code to store -var2 options in an array is simple enough.
my ($variable1, #variable2, $variable3);
if ($arg eq "-var1") {
$variable1 = shift(#arguments)
} elsif ($arg eq "-var2") {
push #variable2, shift(#arguments)
} elsif ($arg eq "-var3") {
$variable3 = shift(#arguments)
}
(I've also removed the main:: from your variables and added the, presumably missing, $s. It's really unlikely that you want to be using package variables rather than lexical variables.)
This particular wheel already exists. Please don't try to reinvent it. That just makes it a pain for the people trying to use your script. There's no reason to force people to learn a whole new set of rules in order to execute your program.
use File::Basename qw( basename );
use Getopt::Long qw( );
my $foo;
my #bars;
my $baz;
sub help {
my $prog = basename($0);
print
"Usage:
$prog [options]
$prog --help
Options:
--foo foo
...
--bar bar
May be used multiple times.
...
--baz baz
...
";
exit(0);
}
sub usage {
if (#_) {
my ($msg) = #_;
chomp($msg);
say STDERR $msg;
}
my $prog = basename($0);
say STDERR "Try '$prog --help' for more information.";
exit(1);
}
sub parse_args {
Getopt::Long::Configure(qw( posix_default ));
Getopt::Long::GetOptions(
"help" => \&help,
"foo=s" => \$foo,
"bar=s" => \#bars,
"baz=s" => \$baz,
)
or usage();
!#ARGV
or usage("Too many arguments");
return #ARGV;
}
main(parse_args());
Well, it is good practice to document your core -- you would appreciate it as soon as you return to make changes
NOTE: in Linux it requires perl-doc package to be installed to use --man option in full extent
#!/usr/bin/perl
#
# Description:
# Describe purpose of the program
#
# Parameters:
# Describe parameters purpose
#
# Date: Tue Nov 29 1:18:00 UTC 2019
#
use warnings;
use strict;
use Getopt::Long qw(GetOptions);
use Pod::Usage;
my %opt;
GetOptions(
'input|i=s' => \$opt{input},
'output|o=s' => \$opt{output},
'debug|d' => \$opt{debug},
'help|?' => \$opt{help},
'man' => \$opt{man}
) or pod2usage(2);
pod2usage(1) if $opt{help};
pod2usage(-exitval => 0, -verbose => 2) if $opt{man};
print Dumper(\%opt) if $opt{debug};
__END__
=head1 NAME
program - describe program's functionality
=head1 SYNOPSIS
program.pl [options]
Options:
-i,--input input filename
-o,--output output filename
-d,--debug output debug information
-?,--help brief help message
--man full documentation
=head1 OPTIONS
=over 4
=item B<-i,--input>
Input filename
=item B<-o,--output>
Output filename
=item B<-d,--debug>
Print debug information.
=item B<-?,--help>
Print a brief help message and exits.
=item B<--man>
Prints the manual page and exits.
=back
B<This program> accepts B<several parameters> and operates with B<them> to produce some B<result>
=cut

Perl HTML::parser error; Undefined subroutine &main::1

I am getting the error
Undefined subroutine &main::1 called at /usr/local/lib/perl/5.10.0/HTML/Parser.pm line 102.
Here is my code
#open (IN, "<", "foo.html") or die "can't open source file: $!";
my $p = HTML::Parser->new( api_version => 3,
start_h => [&start, "tagname, attr, text"],
text_h => [&text, "text"],
default_h => [sub { print OUT shift }, "text"],
);
$p->utf8_mode;
$p->empty_element_tags;
$p->ignore_elements(qw(br));
$p->parse_file("foo.html") or die "parsing failed: $!";
#while (<IN>) {
# $p->parse($_) || die "parsing failed: $!";
#}
#$p->eof;
#close IN;
As you can see in the commented out parts I have also tried directly opening and calling parse (with equally little luck).
The file does open fine.
Parser.pm line 102 which is error mentions is the parse_file subroutine, specifically the line calling ->parse
I have no clue where parse is, it is not in HTML::Parser nor did I find it in HTML::Entities the only dependency HTML::Parser has. =/ I am afraid I am lost at this point, the deepest magics of PERL are still a mystery to me.
Try using \&start and \&text:
my $p = HTML::Parser->new( api_version => 3,
start_h => [\&start, "tagname, attr, text"],
text_h => [\&text, "text"],
default_h => [sub { print OUT shift }, "text"],
);
Otherwise you are passing the result of calling start() and text(), not references to them as subs.
In the documentation it says you should use \&start. If you exclude the backslash, it will be using the return value from the function start instead (which will be using the #_ as argument list, as per normal subroutine calling pragma using the &). This value could be 1.
Here is an example:
C:\perl>perl -we "$c=\&s; sub s { print 'yada' }; $c->();"
yada
C:\perl>perl -we "$c=&s; sub s { print 'yada' }; $c->();"
Undefined subroutine &main::1 called at -e line 1.
yada
Not sure why the error turns up there, but you might change it, see if it helps.
Oh, also, it does seem like you are not using use strict. When using strict, I get a much more helpful error:
C:\perl>perl -we "use strict; my $c=&s; sub s { print 'a' }; $c->();"
Can't use string ("1") as a subroutine ref while "strict refs" in use at -e line

How to pass a file handle to a function?

When I run the code below I get
Can't use string ("F") as a symbol ref while "strict refs" in use at ./T.pl line 21.
where line 21 is
flock($fh, LOCK_EX);
What am I doing wrong?
#!/usr/bin/perl
use strict;
use warnings;
use Fcntl ':flock', 'SEEK_SET'; # file locking
use Data::Dumper;
# use xx;
my $file = "T.yaml";
my $fh = "F";
my $obj = open_yaml_with_lock($file, $fh);
$obj->{a} = 1;
write_yaml_with_lock($obj, $fh);
sub open_yaml_with_lock {
my ($file, $fh) = #_;
open $fh, '+<', $file;
flock($fh, LOCK_EX);
my $obj = YAML::Syck::LoadFile($fh);
return $obj;
}
sub write_yaml_with_lock {
my ($obj, $fh) = #_;
my $yaml = YAML::Syck::Dump($obj);
$YAML::Syck::ImplicitUnicode = 1;
seek $fh,0, SEEK_SET; # seek back to the beginning of file
print $fh $yaml . "---\n";
close $fh;
}
What you're doing wrong is using the string "F" as a filehandle. This
has never been something that's worked; you could use a bareword as a
filehandle (open FH, ...; print FH ...), or you could pass in an
empty scalar and perl would assign a new open file object to that
variable. But if you pass in the string F, then you need to refer to
then handle as F, not $fh. But, don't do that.
Do this instead:
sub open_yaml_with_lock {
my ($file) = #_;
open my $fh, '+<', $file or die $!;
flock($fh, LOCK_EX) or die $!;
my $obj = YAML::Syck::LoadFile($fh); # this dies on failure
return ($obj, $fh);
}
We're doing several things here. One, we're not storing the
filehandle in a global. Global state makes your program extremely
difficult to understand -- I had a hard time with your 10 line post --
and should be avoided. Just return the filehandle, if you want to
keep it around. Or, you can alias it like open does:
sub open_yaml_with_lock {
open $_[0], '+<', $_[1] or die $!;
...
}
open_yaml_with_lock(my $fh, 'filename');
write_yaml_with_lock($fh);
But really, this is a mess. Put this stuff in an object. Make new
open and lock the file. Add a write method. Done. Now you can
reuse this code (and let others do the same) without worrying about
getting something wrong. Less stress.
The other thing we're doing here is checking errors. Yup, disks can
fail. Files can be typo'd. If you blissfully ignore the return value
of open and flock, then your program may not be doing what you think
it's doing. The file might not be opened. The file might not be
locked properly. One day, your program is not going to work properly
because you spelled "file" as "flie" and the file can't be opened.
You will scratch your head for hours wondering what's going on.
Eventually, you'll give up, go home, and try again later. This time,
you won't typo the file name, and it will work. Several hours will
have been wasted. You'll die several years earlier than you should
because of the accumulated stress. So just use autodie or write or
die $! after your system calls so that you get an error message when
something goes wrong!
Your script would be correct if you wrote use autodie qw/open flock
seek close/ at the top. (Actually, you should also check that
"print" worked or use
File::Slurp or
syswrite, since autodie can't detect a failing print statement.)
So anyway, to summarize:
Don't open $fh when $fh is defined. Write open my $fh to
avoid thinking about this.
Always check the return values of system calls. Make autodie do
this for you.
Don't keep global state. Don't write a bunch of functions that
are meant to be used together but rely on implicit preconditions
like an open file. If functions have preconditions, put them in
a class and make the constructor satisfy the preconditions.
This way, you can't accidentally write buggy code!
Update
OK, here's how to make this more OO. First we'll do "pure Perl" OO
and then use Moose. Moose is
what I would use for any real work; the "pure Perl" is just for the
sake of making it easy to understand for someone new to both OO and
Perl.
package LockedYAML;
use strict;
use warnings;
use Fcntl ':flock', 'SEEK_SET';
use YAML::Syck;
use autodie qw/open flock sysseek syswrite/;
sub new {
my ($class, $filename) = #_;
open my $fh, '+<', $filename;
flock $fh, LOCK_EX;
my $self = { obj => YAML::Syck::LoadFile($fh), fh => $fh };
bless $self, $class;
return $self;
}
sub object { $_[0]->{obj} }
sub write {
my ($self, $obj) = #_;
my $yaml = YAML::Syck::Dump($obj);
local $YAML::Syck::ImplicitUnicode = 1; # ensure that this is
# set for us only
my $fh = $self->{fh};
# use system seek/write to ensure this really does what we
# mean. optional.
sysseek $fh, 0, SEEK_SET;
syswrite $fh, $yaml;
$self->{obj} = $obj; # to keep things consistent
}
Then, we can use the class in our main program:
use LockedYAML;
my $resource = LockedYAML->new('filename');
print "Our object looks like: ". Dumper($resource->object);
$resource->write({ new => 'stuff' });
Errors will throw exceptions, which can be handled with
Try::Tiny, and the YAML
file will stay locked as long as the instance exists. You can, of
course, have many LockedYAML objects around at once, that's why we
made it OO.
And finally, the Moose version:
package LockedYAML;
use Moose;
use autodie qw/flock sysseek syswrite/;
use MooseX::Types::Path::Class qw(File);
has 'file' => (
is => 'ro',
isa => File,
handles => ['open'],
required => 1,
coerce => 1,
);
has 'fh' => (
is => 'ro',
isa => 'GlobRef',
lazy_build => 1,
);
has 'obj' => (
is => 'rw',
isa => 'HashRef', # or ArrayRef or ArrayRef|HashRef, or whatever
lazy_build => 1,
trigger => sub { shift->_update_obj(#_) },
);
sub _build_fh {
my $self = shift;
my $fh = $self->open('rw');
flock $fh, LOCK_EX;
return $fh;
}
sub _build_obj {
my $self = shift;
return YAML::Syck::LoadFile($self->fh);
}
sub _update_obj {
my ($self, $new, $old) = #_;
return unless $old; # only run if we are replacing something
my $yaml = YAML::Syck::Dump($new);
local $YAML::Syck::ImplicitUnicode = 1;
my $fh = $self->fh;
sysseek $fh, 0, SEEK_SET;
syswrite $fh, $yaml;
return;
}
This is used similarly:
use LockedYAML;
my $resource = LockedYAML->new( file => 'filename' );
$resource->obj; # the object
$resource->obj( { new => 'object' }); # automatically saved to disk
The Moose version is longer, but does a lot more runtime consistency
checking and is easier to enhance. YMMV.
From the documentation:
open FILEHANDLE,EXPR
If FILEHANDLE is an undefined scalar variable (or array or hash
element) the variable is assigned a reference to a new anonymous
filehandle, otherwise if FILEHANDLE is an expression, its value is
used as the name of the real filehandle wanted. (This is considered a
symbolic reference, so "use strict 'refs'" should
not be in effect.)
Filehandle here is an expression ("F") so itsvalue is used as the name of the real filehandle you want. (A filehandle called F). And then... the documentation says "use strict 'refs'" should not be in effect, because you're using F as a symbolic reference.
(use strict; on line 1 includes strict 'refs'.)
Had you just said at the beginning:
my $fh;
This would have worked, because then $fh would become a reference to a new anonymous filehandle and Perl won't try to use it as a symbolic reference.
This works:
#!/usr/bin/perl
my $global_fh;
open_filehandle(\$global_fh);
use_filehandle(\$global_fh);
sub open_filehandle {
my ($fh)=#_;
open($$fh, ">c:\\temp\\testfile") || die;
}
sub use_filehandle {
my($fh) = #_;
# Print is pecular that it expects the next token to be the filehandle
# or a simple scalar. Thus, print $$fh "Hello, world!" will not work.
my $lfh = $$fh;
print $lfh "Hello, world!";
close($$fh);
}
Or you can do what the other poster suggested and use $_[1] directly, but that's a bit harder to read.
If you use the value directly in the sub, it will work:
use strict;
use warnings;
use autodie;
my $fh;
yada($fh);
print $fh "testing, testing";
sub yada {
open $_[0], '>', 'yada.gg';
}
Or as a reference:
yada(\$fh);
sub yada {
my $handle = shift;
open $$handle, '>', 'yada.gg';
}
Or better yet, return a filehandle:
my $fh = yada($file);
sub yada {
my $inputfile = shift;
open my $gg, '>', $inputfile;
return $gg;
}
Replace
my $fh = "F"; # text and also a ref in nonstrict mode
with
my $fh = \*F; # a reference, period
Of course, it's better yet to use lexical filehandles, as in open my $fd, ... or die ..., but that's not always possible, e.g. you have STDIN that's predefined. In such cases, use \*FD wherever $fd fits.
There's also a case with old scripts, you have to watch out where a global FD is opened and closed.

perl html parsing lib/tool

Is there some powerful tools/libs for perl like BeautifulSoup to python?
Thanks
HTML::TreeBuilder::XPath is a decent solution for most problems.
I never used BeautifulSoup, but from quick skim over its documentation you might want HTML::TreeBuilder. It can process even broken documents well and allows traverse over parsed tree or query items - look at look_down method in HTML::Element.
If you like/know XPath, see daxim's recommendation. If you like to pick items via CSS selector, have a look at Web::Scraper or Mojo::DOM.
As you're looking for power, you can use XML::LibXML to parse HTML. The advantage then is that you have all the power of the fastest and best XML toolchain (excecpt MSXML, which is MS only) available to Perl to process your document, including XPath and XSLT (which would require a re-parse if you used another parser than XML::LibXML).
use strict;
use warnings;
use XML::LibXML;
# In 1.70, the recover and suppress_warnings options won't shup up the
# warnings. Hence, a workaround is needed to keep the messages away from
# the screen.
sub shutup_stderr {
my( $subref, $bufref ) = #_;
open my $fhbuf, '>', $bufref;
local *STDERR = $fhbuf;
$subref->(); # execute code that needs to be shut up
return;
}
# ==== main ============================================================
my $url = shift || 'http://www.google.de';
my $parser = XML::LibXML->new( recover => 2 ); # suppress_warnings => 1
# Note that "recover" and "suppress_warnings" might not work - see above.
# https://rt.cpan.org/Public/Bug/Display.html?id=58024
my $dom; # receive document
shutup_stderr
sub { $dom = $parser->load_html( location => $url ) }, # code
\my $errmsg; # buffer
# Now process document as XML.
my #nodes = $dom->getElementsByLocalName( 'title' );
printf "Document title: %s\n", $_->textContent for #nodes;
printf "Lenght of error messages: %u\n", length $errmsg;
print '-' x 72, "\n";
print $dom->toString( 1 );

How do I determine whether a Perl file handle is a read or write handle?

You are given either an IO::File object or a typeglob (\*STDOUT or Symbol::symbol_to_ref("main::FH")); how would you go about determining if it is a read or write handle? The interface cannot be extended to pass this information (I am overriding close to add calls to flush and sync before the actual close).
Currently I am attempting to flush and sync the filehandle and ignoring the error "Invalid argument" (which is what I get when I attempt to flush or sync a read filehandle):
eval { $fh->flush; 1 } or do {
#this seems to exclude flushes on read handles
unless ($! =~ /Invalid argument/) {
croak "could not flush $fh: $!";
}
};
eval { $fh->sync; 1 } or do {
#this seems to exclude syncs on read handles
unless ($! =~ /Invalid argument/) {
croak "could not sync $fh: $!";
}
};
Have a look at the fcntl options. Maybe F_GETFL with O_ACCMODE.
Edit: I did a little googling and playing over lunch and here is some probably non-portable code but it works for my Linux box, and probably any Posix system (perhaps even Cygwin, who knows?).
use strict;
use Fcntl;
use IO::File;
my $file;
my %modes = ( 0 => 'Read only', 1 => 'Write only', 2 => 'Read / Write' );
sub open_type {
my $fh = shift;
my $mode = fcntl($fh, F_GETFL, 0);
print "File is: " . $modes{$mode & 3} . "\n";
}
print "out\n";
$file = new IO::File();
$file->open('> /tmp/out');
open_type($file);
print "\n";
print "in\n";
$file = new IO::File();
$file->open('< /etc/passwd');
open_type($file);
print "\n";
print "both\n";
$file = new IO::File();
$file->open('+< /tmp/out');
open_type($file);
Example output:
$ perl test.pl
out
File is: Write only
in
File is: Read only
both
File is: Read / Write