Why are there several c files in symbol table of %main::? - perl

'_<perlmain.c' => *{'::_<perlmain.c'},
'_</usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/auto/Data/Dumper/Dumper.so' => *{'::_</usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/auto/Data/Dumper/Dumper.so'},
'_<universal.c' => *{'::_<universal.c'},
'_<xsutils.c' => *{'::_<xsutils.c'},
...
Why are they in the symbol table of %main::,when are they useful?

To repeat the output from the question, run
#! /usr/bin/env perl
use Data::Dumper;
print Dumper \%main::;
The entries you see are inserted in gv_fetchfile_flags:
/* This is where the debuggers %{"::_<$filename"} hash is created */
tmpbuf[0] = '_';
tmpbuf[1] = '<';
memcpy(tmpbuf + 2, name, namelen);
gv = *(GV**)hv_fetch(PL_defstash, tmpbuf, tmplen, TRUE);
if (!isGV(gv)) {
gv_init(gv, PL_defstash, tmpbuf, tmplen, FALSE);
#ifdef PERL_DONT_CREATE_GVSV
GvSV(gv) = newSVpvn(name, namelen);
#else
sv_setpvn(GvSV(gv), name, namelen);
#endif
}
This is called many times by way of newXS as part of the boot process in S_parse_body.
boot_core_PerlIO();
boot_core_UNIVERSAL();
boot_core_mro();
Note that you also see entries for perlio.c, universal.c, and mro.c in the output.
The Debugger Internals section of the perldebguts documentation explains their use:
For example, whenever you call Perl's built-in caller function from the package DB, the arguments that the corresponding stack frame was called with are copied to the #DB::args array. These mechanisms are enabled by calling Perl with the -d switch. Specifically, the following additional features are enabled (cf. $^P in perlvar):
…
Each array #{"_<$filename"} holds the lines of $filename for a file compiled by Perl. The same is also true for evaled strings that contain subroutines, or which are currently being executed. The $filename for evaled strings looks like (eval 34). Code assertions in regexes look like (re_eval 19).
Each hash %{"_<$filename"} contains breakpoints and actions keyed by line number. Individual entries (as opposed to the whole hash) are settable. Perl only cares about Boolean true here, although the values used by perl5db.pl have the form "$break_condition\0$action".
The same holds for evaluated strings that contain subroutines, or which are currently being executed. The $filename for evaled strings looks like (eval 34) or (re_eval 19) .
Each scalar ${"_<$filename"} contains "_<$filename". This is also the case for evaluated strings that contain subroutines, or which are currently being executed. The $filename for evaled strings looks like (eval 34) or (re_eval 19).
After each required file is compiled, but before it is executed, DB::postponed(*{"_<$filename"}) is called if the subroutine DB::postponed exists. Here, the $filename is the expanded name of the required file, as found in the values of %INC.
…

As perl is a interpreter based language, it needs, right, his interpreter, the perl binary. This binary just reads the perl script, and executes the code by translating it in machine code.
Your perl interpreter is compiled with debug symbols, so it contains informations about the source files it build of. Also you see the objects of loaded modules Data::Dumper in your example.
Hope that helps

Related

Perl subroutines

Here I fixed most of my mistakes and thank you all, any other advice please with my hash at this point and how can I clear each word and puts the word and its frequency in a hash, excluding the empty words.. I think my code make since now.
So you can focus on the key part of the algorithm, how about accepting input on STDIN and output to STDOUT. That way there's no argument checking, etc. Just a simple:
$ prog < words.txt
All you really need is a very simple algorithm:
Read a line
Split it into words
Record a count of the word
When done, display the counts
Here's a sample program
#! /usr/bin/perl -w
use strict;
my (%data);
while (<STDIN>) {
chomp;
my(#words) = split(/\s+/);
foreach my $word (#words) {
if (!defined($data{$word})) {
$data{$word} = 0;
}
$data{$word}++;
}
}
foreach (sort(keys(%data))) {
print "$_: $data{$_}\n";
}
Once you understand this and have it working in your environment, you can extend it to meet your other requirements:
remove non-alphabetic characters from each word
print three results per line
use input and output files
put the algorithm into a subroutine
I agree that starting with dave's answer would be more productive, but if you are interested in your mistakes, here is what I see:
You assign the return value of checkArgs to a scalar variable $checkArgs, but return an array value. It means that $checkArgs will always contain 2 (the size of the array) after this call (because the program dies if the number of arguments is not 2). It is not very bad since you do not use the value later, but why you need it at all in this case?
You open files and close them immediately without reading from them. Does not make sense.
Statement
while (<>)
reads either from standard output or from all files in the command line arguments. The latter variant is like what you want, but your second argument is the output file, not input. The diamond operator will try to read from it too. You have two options: a) use only one file name in the command line arguments, read the file with <>, use standard output for output, and redirect output to a file in shell; b) use
while(<$file1>)
instead, of course, before closing files. Option a) is the traditional Unix- and Perl-style, but b) provides for clearer code for beginners.
Statements
return $word;
and
return $str, $hash{$str};
return corresponding values on the first iterations of the loops, all other data remain unprocessed. In the first case, you should create a local array, store all $word in it and return the array as a whole. In the second case, you already have such local %hash, it is enough to return this hash. In both cases, you need should assign the return values of the functions not to scalars, but to an array and a hash correspondingly. Now, you actually lose all you data.

Tracing the source of an imported subroutine

I'm modifying a rather large unit test that uses 27 modules before loading the test framework:
use Test::Most;
When the script reaches this line, it outputs the following warning:
mytest.t ........... Subroutine main::explain redefined at mytest.t line 84.
Now I can hide the redefine messages by simply undefining the the subroutine before calling use.
BEGIN {
undef *explain; # Method imported somewhere before. Hide the redefine messages
}
use Test::Most;
However, I'd like to determine which module is importing the other version of explain.
Could use a process of elimination and just comment out everything until I get the warning, but would be nice if there was a more direct route to determining the source.
Inserting use Devel::Peek qw( ); BEGIN { Devel::Peek::Dump(\&foo); } before the line that gives the warning will tell you which package (COMP_STASH) and file name (FILE).
A solution that also gets you the line number is possible. The function's opcode tree could be walked until a nextstate is found (which is probably the very first op of the tree). The file name and line number can be extracted from the op. nextstate ops set the file and line number issued by runtime warnings.
Notes:
#line directives affect both solutions.
If a module exports an imported sub, both solution would give the original package and file of origin, not of the intermediary.
You can use perl's introspection facility (called B) for this:
use B;
my $gv = B::svref_2object(\&explain)->GV;
printf "%s::%s file %s line %s\n", $gv->STASH->NAME, $gv->NAME, $gv->FILE, $gv->LINE;
Test::Most::explain file /usr/share/perl5/Test/Most.pm line 175
(line is the end of the sub, not the beginning)

Special character in the file exit the while loop in Perl

I wrote a simple parser for a .txt file with the following instructions:
my $file2 = "test.txt";
open ($process, "<",$file2) or die "couldn't manage to open the file:$file2!";
while (<$process>)
{
...
}
In some files that I am trying to parse there is a special character that is like the right arrow (->) and that I don't manage to paste here from the file.
Every time the parser hits that character (->), it exits the file without processing it till the end.
Is there a way to avoid it and continue processing the file till the very end?
I am using perl 5.6.1 (I cannot use a newer one) and the files that I need to process might have these special characters.
Thanks for your help.
I don't think it's perl that's causing your problem, but almost certainly something in that middle missing block. Are you using eval in the while block on the input from the file? This is a minimal example that shows that the stream containing -> doesn't cause difficulties:
#!/usr/bin/env perl
use warnings;
use strict;
while(<DATA>) {
print "Data[$.]: $_";
}
__DATA__
this is some data
this is also some data
-> this looks fine
foo->dingle also looks fine
This produces:
$ perl ./foo.pl
Data[1]: this is some data
Data[2]: this is also some data
Data[3]: -> this looks fine
Data[4]: foo->dingle also looks fine
So, the -> characters in perl are special:
"-> " is an infix dereference operator, just as it is in C and C++. If
the right side is either a [...] , {...} , or a (...) subscript, then
the left side must be either a hard or symbolic reference to an array,
a hash, or a subroutine respectively. (Or technically speaking, a
location capable of holding a hard reference, if it's an array or hash
reference being used for assignment.) See perlreftut and perlref.
Otherwise, the right side is a method name or a simple scalar variable
containing either the method name or a subroutine reference, and the
left side must be either an object (a blessed reference) or a class
name (that is, a package name). See perlobj.
So if I had to guess, you're definitely using eval to try to parse your content and it's failing to dereference the left side of the operator and crashing. Please provide command line error messages or the code in the while loop if you want further assistance.

Meaning of the <*> symbol

I've recently been exposed to a bit of Perl code, and some aspects of it are still elusive to me. This is it:
#collection = <*>;
I understand that the at-symbol defines collection as an array. I've also searched around a bit, and landed on perldoc, specifically at the part about I/O Operators. I found the null filelhandle specifically interesting; code follows.
while (<>) {
...
}
On the same topic I have also noticed that this syntax is also valid:
while (<*.c>) {
...
}
According to perldoc It is actually calling an internal function that invokes glob in a manner similar as the following code:
open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|");
while (<FOO>) {
...
}
Question
What does the less-than, asterisk, more-than (<*>) symbol mentioned on the first line actually do? Is it a reference to an internally open and referenced glob? Would it be a special case, such as the null filehandle? Or can it be something entirely different, like a legacy implementation?
<> (the diamond operator) is used in two different syntaxes.
<*.c>, <*> etc. is shorthand for the glob built-in function. So <*> returns a list of all files and directories in the current directory. (Except those beginning with a dot; use <* .*> for that).
<$fh> is shorthand for calling readline($fh). If no filehandle is specified (<>) the magical *ARGV handle is assumed, which is a list of files specified as command line arguments, or standard input if none are provided. As you mention, the perldoc covers both in detail.
How does Perl distinguish the two? It checks if the thing inside <> is either a bare filehandle or a simple scalar reference to a filehandle (e.g. $fh). Otherwise, it calls glob() instead. This even applies to stuff like <$hash{$key}> or <$x > - it will be interpreted as a call to glob(). If you read the perldoc a bit further on, this is explained - and it's recommended that you use glob() explicitly if you're putting a variable inside <> to avoid these problems.
It collects all filenames in the current directory and save them to the array collection. Except those beginning with a dot. It's the same as:
#collection = glob "*";

Why is parenthesis optional only after sub declaration?

(Assume use strict; use warnings; throughout this question.)
I am exploring the usage of sub.
sub bb { print #_; }
bb 'a';
This works as expected. The parenthesis is optional, like with many other functions, like print, open etc.
However, this causes a compilation error:
bb 'a';
sub bb { print #_; }
String found where operator expected at t13.pl line 4, near "bb 'a'"
(Do you need to predeclare bb?)
syntax error at t13.pl line 4, near "bb 'a'"
Execution of t13.pl aborted due to compilation errors.
But this does not:
bb('a');
sub bb { print #_; }
Similarly, a sub without args, such as:
special_print;
my special_print { print $some_stuff }
Will cause this error:
Bareword "special_print" not allowed while "strict subs" in use at t13.pl line 6.
Execution of t13.pl aborted due to compilation errors.
Ways to alleviate this particular error is:
Put & before the sub name, e.g. &special_print
Put empty parenthesis after sub name, e.g. special_print()
Predeclare special_print with sub special_print at the top of the script.
Call special_print after the sub declaration.
My question is, why this special treatment? If I can use a sub globally within the script, why can't I use it any way I want it? Is there a logic to sub being implemented this way?
ETA: I know how I can fix it. I want to know the logic behind this.
I think what you are missing is that Perl uses a strictly one-pass parser. It does not scan the file for subroutines, and then go back and compile the rest. Knowing this, the following describes how the one pass parse system works:
In Perl, the sub NAME syntax for declaring a subroutine is equivalent to the following:
sub name {...} === BEGIN {*name = sub {...}}
This means that the sub NAME syntax has a compile time effect. When Perl is parsing source code, it is working with a current set of declarations. By default, the set is the builtin functions. Since Perl already knows about these, it lets you omit the parenthesis.
As soon as the compiler hits a BEGIN block, it compiles the inside of the block using the current rule set, and then immediately executes the block. If anything in that block changes the rule set (such as adding a subroutine to the current namespace), those new rules will be in effect for the remainder of the parse.
Without a predeclared rule, an identifier will be interpreted as follows:
bareword === 'bareword' # a string
bareword LIST === syntax error, missing ','
bareword() === &bareword() # runtime execution of &bareword
&bareword === &bareword # same
&bareword() === &bareword() # same
When using strict and warnings as you have stated, barewords will not be converted into strings, so the first example is a syntax error.
When predeclared with any of the following:
sub bareword;
use subs 'bareword';
sub bareword {...}
BEGIN {*bareword = sub {...}}
Then the identifier will be interpreted as follows:
bareword === &bareword() # compile time binding to &bareword
bareword LIST === &bareword(LIST) # same
bareword() === &bareword() # same
&bareword === &bareword # same
&bareword() === &bareword() # same
So in order for the first example to not be a syntax error, one of the preceding subroutine declarations must be seen first.
As to the why behind all of this, Perl has a lot of legacy. One of the goals in developing Perl was complete backwards compatibility. A script that works in Perl 1 still works in Perl 5. Because of this, it is not possible to change the rules surrounding bareword parsing.
That said, you will be hard pressed to find a language that is more flexible in the ways it lets you call subroutines. This allows you to find the method that works best for you. In my own code, if I need to call a subroutine before it has been declared, I usually use name(...), but if that subroutine has a prototype, I will call it as &name(...) (and you will get a warning "subroutine called too early to check prototype" if you don't call it this way).
The best answer I can come up with is that's the way Perl is written. It's not a satisfying answer, but in the end, it's the truth. Perl 6 (if it ever comes out) won't have this limitation.
Perl has a lot of crud and cruft from five different versions of the language. Perl 4 and Perl 5 did some major changes which can cause problems with earlier programs written in a free flowing manner.
Because of the long history, and the various ways Perl has and can work, it can be difficult for Perl to understand what's going on. When you have this:
b $a, $c;
Perl has no way of knowing if b is a string and is simply a bareword (which was allowed in Perl 4) or if b is a function. If b is a function, it should be stored in the symbol table as the rest of the program is parsed. If b isn't a subroutine, you shouldn't put it in the symbol table.
When the Perl compiler sees this:
b($a, $c);
It doesn't know what the function b does, but it at least knows it's a function and can store it in the symbol table waiting for the definition to come later.
When you pre-declare your function, Perl can see this:
sub b; #Or use subs qw(b); will also work.
b $a, $c;
and know that b is a function. It might not know what the function does, but there's now a symbol table entry for b as a function.
One of the reasons for Perl 6 is to remove much of the baggage left from the older versions of Perl and to remove strange things like this.
By the way, never ever use Perl Prototypes to get around this limitation. Use use subs or predeclare a blank subroutine. Don't use prototypes.
Parentheses are optional only if the subroutine has been predeclared. This is documented in perlsub.
Perl needs to know at compile time whether the bareword is a subroutine name or a string literal. If you use parentheses, Perl will guess that it's a subroutine name. Otherwise you need to provide this information beforehand (e.g. using subs).
The reason is that Larry Wall is a linguist, not a computer scientist.
Computer scientist: The grammar of the language should be as simple & clear as possible.
Avoids complexity in the compiler
Eliminates sources of ambiguity
Larry Wall: People work differently from compilers. The language should serve the programmer, not the compiler. See also Larry Wall's outline of the three virtues of a programmer.