Is there a standard way for perl to behave when it runs out of memory? - perl

Is there a standard(ish) way for a Perl interpreter (aka "perl") to behave when it runs out of memory? Is it documented/specced-out in any way? Coded in some uniform way?
I'm especially interested in any standards which are expressed as covenant to Perl code being run - e.g., will die be called? Will END block be executed? Etc...
I'm fine with both an "theoretical" answer (e.g. some sort of generic "this is what perl code ought to do in general on out-of-memory" mission statement document from Larry/P5P/etc..., even if not 100% of malloc() calls follow this rule); or a "practical" statement (e.g. all malloc() calls in Perl are wrapped into a generic "allocate_memory" function which uniformly handles all failures).
It may be possible that the answer depends on what specifically causes out of memory (e.g. a request for more memory for Perl code's data structure vs. memory allocated by internal Perl code unrelated to explicit "need to store more data" logic in Perl program).
If the answer is extremely implementation dependent, assume perl for Solaris/Linux,and narrowing down to any recent stable version (5.8 to 5.16) is acceptable.
The question is limited to standard Perl interpreter, however you wish to define that as far as pre-compile configuration (e.g. perl that comes with a major Linux distribution, or one compiled with all defaults left alone, etc...).
NOTE: This question came out of Gilles's comment to another Q

Taking a look at the manual page for the various diagnostic warnings that Perl will issue when the "use diagnostics" pragma is enabled, you can see various different types of "out of memory" errors and what they mean.
So you can infer the "standard" behavior from these messages; the one with an exclamation point ("Out of memory!") sounds like the one you're asking about:
Out of memory!
(X) The malloc() function returned 0, indicating there was
insufficient remaining memory (or virtual memory) to satisfy the
request. Perl has no option but to exit immediately.
An "X" level error is labeled as "A very fatal error (nontrappable)."
However if it's a "large request" (for greater than 64K) it is trappable (I guess Perl assumes it'll have enough memory to shutdown cleanly).

Related

How can set a hard maximum recursion depth in Perl?

Although in some cases I might want to allow deep recursions in my code, I want to be able to disable it in certain cases (like while testing).
I know that when using the debugger I can use $DB::deep to specify the maximum recursion depth, and the feature I'm after is basically the same but usable even when not in the debugger.
I took a look in CPAN, but I couldn't find anything. And a search on PerlMonks lead me to a thread about changing the behaviour of the deep recursion warning. What I'm after is a to be able to block recursions altogether (eg. die if the recursion gets too deep).
Does this feature exist?
Bonus points if the solution allows me to localise it, so that I can control the scope of a maximum recursion depth.
As a previous answer mentions, you can only change the level that triggers the warning, by recompiling Perl.
But you can make the existing warning fatal like this:
use warnings FATAL => 'recursion';
According to perldoc perldiag:
Deep recursion on subroutine "%s" (W recursion)
This subroutine has
called itself (directly or indirectly) 100 times more than it has
returned. This probably indicates an infinite recursion, unless you're
writing strange benchmark programs, in which case it indicates
something else.
This threshold can be changed from 100, by recompiling the perl
binary, setting the C pre-processor macro PERL_SUB_DEPTH_WARN to the
desired value.
So it seems you cannot localize the behavior unless you modify the perl binary.

What is the difference between subroutines and scripts in Perl?

I am in the midst of learning Perl, and I have encountered a question. What, exactly is the difference between subroutines and scripts?
A script is just a name for a (usually short) program usually contained in a single file. It's not really a scientific/technical term and therefore is pretty vague - people can refer to a "script" when discussing a 3-line quick program, or a 10000 lines of code program.
Some people refer to ANY Perl program as a "script" - see below for the historical reason. Some people, when they say "a Perl script" as opposed to a Perl "program", mean a relatively simple, relatively short program, frequently structured without using any subroutines/classes/other methods of code organization. Again, there's no standard definition.
As an aside, the reason why Perl programs are frequently called "scripts" is that Perl originally was used for writing scripts that perform work in Unix shell, the way shell scripting languages were used. The term "scripting language" means a language used to control an application, in this case Unix shell.
Of course, since then Perl has grown to become a full fledged programming language, but the word/term remained, sometimes used by inertia, sometimes derogatorily.
A subroutine (also known as a procedure, function, routine, method, or subprogram) is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code. It is frequently meant to contain code that performs the task which needs to be done several times in your program, or even by multiple programs.
A subroutine is NOT a Perl specific concept, though calling it "subroutine" is done in very few languages (most use the term function, method or procedure).
As a special side note, a "method" - in Perl as well as other languages - is a special type of subroutine which is associated with an object oriented class or an object of that class. The fact that it's merely a special case of a subroutine is, of course, highlighted by the fact that - despite deepest wishes by "Modern Perl" author chromatic - methods in Perl 5 are declared with "sub" keyword, same as regular subroutines.
As noted above, some people, when referring to a Perl program as a "script", imply that it does not contain subroutines (e.g. anything complicated enough to have a subroutine is no longer a "script" but a "program"). But that is not an accepted or formal definition - as stated, there is no definition of what a script is, everyone uses the term any which way they want.
A script is usually a file, which can contain statements and subroutines. A subroutine is something you find within a script.
Subroutines are described in detail in the perlsub manual page.

Can I use dtrace on OS X 10.5 to determine which of my perl subs is causing the most memory allocation?

We have a pretty big perl codebase.
Some processes that run for multiple hours (ETL jobs) suddenly started consuming a lot more RAM than usual. Analysis of the changes in the relevant release is a slow and frustrating process. I am hoping to identify the culprit using more automated analysis.
Our live environment is perl 5.14 on Debian squeeze.
I have access to lots of OS X 10.5 machines, though. Dtrace and perl seem to play together nicely on this platform. Seems that using dtrace on linux requires a boot more work. I am hoping that memory allocation patterns will be similar between our live system and a dev OS X system - or at least similar enough to help me find the origin of this new memory use.
This slide deck:
https://dgl.cx/2011/01/dtrace-and-perl
shows how to use dtrace do show number of calls to malloc by perl sub. I am interested in tracking the total amount of memory that perl allocates while executing each sub over the lifetime of a process.
Any ideas on how this can be done?
There's no single way to do this, and doing it on a sub-by-sub basis isn't always the best way to examine memory usage. I'm going to recommend a set of tools that you can use, some work on the program as a whole, others allow you to examine a single section of your code or a single variable.
You might want to consider using Valgrind. There's even a Perl module called Test::Valgrind that will help set up a suppression file for your Perl build, and then check for memory leaks or errors in your script.
There's also Devel::Size which does exactly what you asked for, but on a variable-by-variable basis rather than a sub-by-sub basis.
You can use Devel::Cycle to search for inadvertent circular memory references in complex data structures. While a circular reference doesn't mean that you're wasting memory as you use the object, circular references prevent anything in the chain from being freed until the cycle is broken.
Devel::Leak is a little bit more arcane than the rest, but it basically will allow you to get full information on any SVs that are created and not destroyed between two points in your program's execution. If you check this across a sub call, you'll know any new memory that that subroutine allocated.
You may also want to read the perldebguts section of the Perl manual.
I can't really help more because every codebase is going to wind up being different. Test::Valgrind will work great for some codebases and terribly on others. If you are going to try it, I recommend you use the latest version of Valgrind available and Perl >= 5.10, as Perl 5.8 and Valgrind historically didn't get along too well.
You might want to look at Memory::Usage and Devel::Size
To check the whole process or sub:
use Memory::Usage;
my $mu = Memory::Usage->new();
# Record amount of memory used by current process
$mu->record('starting work');
# Do the thing you want to measure
$object->something_memory_intensive();
# Record amount in use afterwards
$mu->record('after something_memory_intensive()');
# Spit out a report
$mu->dump();
Or to check specific variables:
use Devel::Size qw(size total_size);
my $size = size("A string");
my #foo = (1, 2, 3, 4, 5);
my $other_size = size(\#foo);
my $foo = {
a => [1, 2, 3],
b => {a => [1, 3, 4]}
};
my $total_size = total_size($foo);
The answer to the question is 'yes'. Dtrace can be used to analyze memory usage in a perl process.
This snippet of code:
https://github.com/astletron/perl-dtrace-malloc/blob/master/perl-malloc-total-bytes-by-sub.d
tracks how memory use increases between the call and return of every sub in a program. As an added bonus, dtrace seems to sort the output for you (at least on OS X). Cool.
Thanks to all that chimed in. I answered this one myself as the question is really specific to dtrace/perl.
You could write a simple debug module based on Devel::CallTrace that prints the sub entered as well as the current memory size of the current process. (Using /proc or whatever.)

"All programs are interpreted". How?

A computer scientist will correctly explain that all programs are
interpreted and that the only question is at what level. --perlfaq
How are all programs interpreted?
A Perl program is a text file read by the perl program which causes the perl program to follow a sequence of actions.
A Java program is a text file which has been converted into a series of byte codes which are then interpreted by the java program to follow a sequence of actions.
A C program is a text file which is converted via the C compiler into an assembly program which is converted into machine code by the assembler. The machine code is loaded into memory which causes the CPU to follow a sequence of actions.
The CPU is a jumble of transistors, resistors, and other electrical bits which is laid out by hardware engineers so that when electrical impulses are applied, it will follow a sequence of actions as governed by the laws of physics.
Physicists are currently working out what makes those rules and how they are interpreted.
Essentially, every computer program is interpreted by something else which converts it into something else which eventually gets translated into how the electrons in your local neighborhood fly around.
EDIT/ADDED: I know the above is a bit tongue-in-cheek, so let me add a slightly less goofy addition:
Interpreted languages are where you can go from a text file to something running on your computer in one simple step.
Compiled languages are where you have to take an extra step in the middle to convert the language text into machine- or byte-code.
The latter can easily be easily be converted into the former by a simple transformation:
Make a program called interpreted-c, which can take one or more C files and can run a program which doesn't take any arguments:
#!/bin/sh
MYEXEC=/tmp/myexec.$$
gcc -o $MYEXEC ${1+"$#"} && $MYEXEC
rm -f $MYEXEC
Now which definition does your C program fall into? Compare & contrast:
$ perl foo.pl
$ interpreted-c foo.c
Machine code is interpreted by the processor at runtime, given that the same machine code supplied to a processor of a certain arch (x86, PowerPC etc), should theoretically work the same regardless of the specific model's 'internal wiring'.
EDIT:
I forgot to mention that an arch may add new instructions for things like accessing new registers, in which case code written to use it won't work on older processors in the range. Much like when you try to use an old version of a library and then try to use capabilities only found in newer libraries.
Example: many Linux distros are released as 686 only, despite the fact it's in the 'x86 family'. This is due to the use of new instructions.
My first thought was too look inside the CPU — see below — but that's not right. The answer is much much simpler than that.
A high-level description of a CPU is:
1. execute the current op
2. grab the next op
3. goto 1
Compare it to Perl's interpreter:
while ((PL_op = op = op->op_ppaddr(aTHX))) {
}
(Yeah, that's the whole thing.)
There can be no doubt that the CPU is an interpreter.
It just goes to show how useless it is to classify something is interpreted or not.
Original answer:
Even at the CPU level, programs get rewritten into simpler instructions to allow the CPU to execute more them more quickly. This is done by changing the order in which they are executed and executing them in parallel. For example, Intel's Hyperthreading.
Even deeper, each instruction is considered a program of its own, one that routes electronic signals. See microcode.
The Levels of interpretions are really easy to explain:
2: Runtimelanguage (CLR, Java Runtime...) & Scriptlanguage (Python, Ruby...)
1: Assemblies
0: Binary Code
Edit: I changed the level of Scriptinglanguages to the same level of Runtimelanguages. Thank's for the hint. :-)
I can write a Game Boy interpreter that works similarly to how the Java Virtual Machine works, treating the z80 machine instructions as byte code. Assuming the original was written in C1, does that mean C suddenly became an interpreted language just because I used it like one?
From another angle, gcc can compile C into machine code for a number of different processors. There's no reason the target machine has to be the same as the machine you're compiling on. In fact, this is a common way to compile C code for AVRs and other microcontrollers.
As a matter of abstraction, the compiler's job is to translate flat text into a structure, then translate that structure into something that can be executed somewhere. Whatever is doing the execution may have its own levels of breaking out the structure before really executing it.
A lot of power becomes available once you start thinking along these lines.
A good book on this is Structure and Interpretation of Computer Programs. Even if you only get through the first chapter (or half of the first chapter), I think you'll learn a lot.
1 I think most Game Boy stuff was hand coded ASM, but the principle remains.

What are the uses of Perl's C translation backend?

Other than the purely obvious: "It translates Perl to C."; are there any real world uses (a.k.a. hacks) for the Perl compiler's optimized C translation backend, B::CC?
Not really. It means you can convert a (small) Perl script into a (big) C program, which will be much harder for the recipient to reverse engineer. In some paranoid circles, this might be accounted an advantage (for example, if your Perl code is embarrassingly bad and you'd rather conceal that fact from your paying customers). But mostly it is of limited to negative value.
Compiling a Perl program to an optree, which can then be executed, can take a while sometimes. You can safe some of that time by using perlcc with any of its backends. That'll, in one way or another, serialise the compiled optree and make loading it later, when executing your compiled binary, somewhat faster. I can see that being useful in, for example, CGI environments, for which, however, much better alternatives of avoiding startup costs are available.
Contrary to popular believe, perlcc doesn't make it very hard to reverse-engineer the resulting binary, as discussed in How can I reverse-engineer a Perl program that has been compiled with perlcc?