Hash randomization in Perl 5 - perl

When Perl 5.8.1 came out it added hash randomization. When Perl 5.8.2 came out, I thought, it removed hash randomization unless an environment variable (PERL_HASH_SEED) was present. It now seems as if I am gravely mistaken as
PERL_HASH_SEED=$SEED perl -MData::Dumper -e 'print Dumper{map{$_,1}"a".."z"}'
Always kicks back the same key ordering regardless of the value of $SEED.
Did hash randomization go completely away, am I doing something wrong, or is this a bug?

See Algorithmic Complexity Attacks:
In Perl 5.8.1 the hash function is randomly perturbed by a pseudorandom seed which makes generating such naughty hash keys harder. [...] but as of 5.8.2 it is only used on individual hashes if the internals detect the insertion of pathological data.
So randomization doesn't always happen, only when perl detects that it's needed.

At a minimum there have been some sloppy documentation updates. In the third paragraph of perlrun's entry for PERL_HASH_SEED it says:
The default behaviour is to randomise unless the PERL_HASH_SEED is set.
which was true only in 5.8.1 and contradicts the paragraph immediately preceding it:
Most hashes by default return elements in the same order as in Perl 5.8.0. On a hash by hash basis, if pathological data is detected during a hash key insertion, then that hash will switch to an alternative random hash seed.
perlsec's entry for Algorithmic Complexity Attacks gets this right:
In Perl 5.8.1 the random perturbation was done by default, but as of
5.8.2 it is only used on individual hashes if the internals detect the
insertion of pathological data.
perlsec goes on to say
If one wants for some reason emulate the old behaviour [...] set the
environment variable PERL_HASH_SEED to zero to disable the
protection (or any other integer to force a known perturbation, rather
than random).
[emphasis added]
Since setting PERL_HASH_SEED does not effect the hash order, I'd call it a bug. Searching for "PERL_HASH_SEED" on rt.perl.org didn't return any results, so it doesn't appear to be a "known" issue.

Related

Perl: How to free memory allocated for a scalar without access to the Perl variable?

This question is related to an answer to a former question about memory handling by Perl. I've learned that one can free memory in Perl by explicitly using the undef function on an available scalar and using Devel::Peek or Devel::Size or such one can see how many memory is allocated for a scalar. In all those cases the scalars debugged are used within their scope.
But is it possible to debug things like allocated memory outside the scope of variables, just on the level of a Perl interpreter? Something like searching for all allocated memory for all "things" that are a scalar in the current interpreter and print their associated data, like current value or such?
And if that's the case, if one does already have that information, is one even able to free the known memory? Just like calling undef on a scalar, but without the scalar, something more low level, like on those "things" output of Devel::Peek.
What I'm thinking about is having a mod_perl cleanup handler executed after a request, scanning the current mod_perl interpreter for large chunks of data and freeing them manually. Simply because I decide that large blocks of allocated data are of no use anymore, even if Perl thinks otherwise:
Finally and perhaps the biggest win is memory re-use: as calls are made into Perl subroutines, memory allocations are made for variables when they are used for the first time. Subsequent use of variables may allocate more memory, e.g. if a scalar variable needs to hold a longer string than it did before, or an array has new elements added. As an optimization, Perl hangs onto these allocations, even though their values "go out of scope".
https://perl.apache.org/docs/2.0/user/intro/overview.html#Threads_Support
I could find a lot of monitoring and debugging packages around low level memory access, but no hint yet how one could call something like the undef function on some low level Perl struct in Perl. Might simply not be possible without any XS or such...
is it possible to debug things like allocated memory outside the scope of variables
There really isn't any such memory. Any memory allocated outside of variables is surely needed. As you yourself point out, it's the memory allocated for variables that make up most "wasted" space.
but no hint yet how one could call something like the undef function on some low level Perl struct in Perl.
It's because there are no such structs.
Just like calling undef on a scalar, but without the scalar, something more low level, like on those "things" output of Devel::Peek.
Devel::Peek's only function, Dump, outputs things in variables. Like you've said, undef is what you'd want to clear these.
From the above, it's obvious you want to know how to free the memory associated with the variables in subs.
You also overlooked the fact that many operators have an associated variable (called "target") in which they return their result.
Approach 1
A simple way to clear all those variables would be to selectively clear the symbol table (%::). This would effectively "unload" every module. Be sure not clear core components (perl -E'say for sort keys %::'). And don't forget to clear %INC so the modules can be reloaded.
If clearing the symbol table is the approach you want to take, it might be less risky and time-consuming to take a snapshot of %:: early on, and restore that snapshot when it's time to clear the symbol.
Approach 2
If you didn't want to reload the modules, you could take attempt to locate every sub, and undef their vars, then undef the vars of their ops.
A sub's vars exists within its pads. Conveniently, so do opcode targets. There's a pad for each level of recursion the sub has experienced.
Given a reference to a sub, you can find the variables in a sub's pads. You can refer to PadWalker for an example of how to do this. You can't actually use PadWalker since it only returns one variable per variable name, even if there are more than one (due to more than one variable being declared with the same name, or due to recursion).
Captured variables and our variables should be left untouched. It's possible to detect whether a pad entry is one of these. (Again, refer to PadWalker.)
(Obviously, you could also looking into freeing the sub's extra pads!)
How do you find all the subs? Well, navigating the symbol table will give you most of them. Finding anon ones will be trickier.
Approach 3
The most efficient approach is to simply terminate the mod_perl thread/process. A new clean one will automatically be spawned. It's also the simplest to implement, as it's simply a configuration change (setting MaxRequestsPerChild to 1).
Another form of wasted memory is a memory leak. That's another large question, so I'm not touching it.
I think you are looking for this answer to a similar question.
Everything you really need to know you can find in the internals to the Devel::MAT::* submodules. Namely the Devel::MAT::Dumper.xs, which has the structure of the heap for perl interpreter. The module is designed to dump the heap at signal and analyze it later, but I think you can turn it into a runtime check. If you need help on reading the XS, look here.
To debug memory allocation you should recompile perl with -Accflags=-DPERL_MEM_LOG DOC
(see related question about how to recompile perl)
You maybe will be interested in MEMORY DEBUGGERS
To free perl scalar, just like when it leave her scope:
{ # Scope enter
my $x; # Memory allocation
} # << HERE $x is freed
You should just reduce its variable REFCNT to zero by SvREFCNT_dec macro DOC
To free an SV that you've created, call SvREFCNT_dec(SV*) . Normally this call is not necessary (see Reference Counts and Mortality).
Here is pseudo code:
{
$x;
call_xs_sub( $x ); # << HERE $x is freed
}
XS pseudo code:
call_xs_sub( SV *sv ) {
...
SvREFCNT_dec( sv ); # <<HERE scalar is freed
...
}
To spy every memory allocation you should walk perl arenas.
At compile time you may view every place where variable is declared and accessed with help of B::Xref module
Or run perl with -Dm option (The perl should be compiled with corresponding options. See this topic):
perl -Dm script.pl

How do I change the default ONE_AT_A_TIME_HARD hash function in Perl 5.18?

I'm not really familiar with Perl, but I've been searching in the documentation and other sources without success for the last 2 days. In the documentation, it is written:
Perl v5.18 includes support for multiple hash functions, and changed the default (to ONE_AT_A_TIME_HARD), you can choose a different algorithm by defining a symbol at compile time. For a current list, consult the INSTALL document. Note that as of Perl v5.18 we can only recommend use of the default or SIPHASH. All the others are known to have security issues and are for research purposes only.
The thing is that neither in INSTALL document nor in other sources/sites etc. I can find how to define this symbol.
What I want to do is to change the default ONE_AT_A_TIME_HARD hash function to ONE_AT_A_TIME_OLD so I can simulate the old Perl 5.16 behavior.
This sounds like an XY problem. What are you trying to accomplish by forcibly downgrading the hash algorithm in perl to one that has known problems?
From comments:
I need to run a lot of test cases written in perl 5.16 whose functionality depends on the old hash implementation and it's quite impossible to change the code as the cases are hundreds.
Whew, that's bad news. Find those developers, and hit them around the head with a copy perldata:
Hashes are unordered collections of scalar values indexed by their associated string key.
Specifically - if this is a problem for you, it means your codebase treats hashes as ordered, when they aren't and never were. (It's just they were fairly consistent before 5.18 and more random after).
From perldelta:
When encountering these changes, the key to cleaning up from them is to accept that hashes are unordered collections and to act accordingly.
See: http://blog.booking.com/hardening-perls-hash-function.html
To answer your question - if you really must:
./Configure -DPERL_HASH_FUNC_ONE_AT_A_TIME_OLD -des && make && make test
But it's a very very bad idea, because as the INSTALL file in your perl source package points out:
Note that as of Perl 5.18 we can only recommend the use of default or SIPHASH. All the others are known to have security issues and are for research purposes only.
By building your perl this way you introduce a known security flaw for every perl program using it.
Note - ONE_AT_A_TIME_HARD is the new default, so this won't change how perl 5.18 works. You may mean PERL_HASH_FUNC_ONE_AT_A_TIME_OLD

Why doesn't map read from #ARGV/#_?

Is there a good reason for map to not read from #_ (in functions) or #ARGV (anywhere else) when not given an argument list?
I can't say why Larry didn't make map, grep and the other list functions operate on #_ like pop and shift do, but I can tell you why I wouldn't. Default variables used to be in vogue, but Perl programmers have discovered that most of the "default" behaviors cause more problems than they solve. I doubt they would make it into the language today.
The first problem is remembering what a function does when passed no arguments. Does it act on a hidden variable? Which one? You just have to know by rote, and that makes it a lot more work to learn, read and write the language. You're probably going to get it wrong and that means bugs. This could be mitigated by Perl being consistent about it (ie. ALL functions which take lists operate on #_ and ALL functions which take scalars operate on $_) but there's more problems.
The second problem is the behavior changes based on context. Take some code outside of a subroutine, or put it into a subroutine, and suddenly it works differently. That makes refactoring harder. If you made it work on just #_ or just #ARGV then this problem goes away.
Third is default variables have a tendency to be quietly modified as well as read. $_ is dangerous for this reason, you never know when something is going to overwrite it. If the use of #_ as the default list variable were adopted, this behavior would likely leak in.
Fourth, it would probably lead to complicated syntax problems. I'd imagine this was one of the original reasons keeping it from being added to the language, back when $_ was in vogue.
Fifth, #ARGV as a default makes some sense when you're writing scripts that primarily work with #ARGV... but it doesn't make any sense when working on a library. Perl programmers have shifted from writing quick scripts to writing libraries.
Sixth, using $_ as default is a way of chaining together scalar operations without having to write the variable over and over again. This might have been mitigated if Perl was more consistent about its return values, and if regexes didn't have special syntax, but there you have it. Lists can already be chained, map { ... } sort { ... } grep /.../, #foo, so that use case is handled by a more efficient mechanism.
Finally, it's of very limited use. It's very rare that you want to pass #_ to map and grep. The problems with hidden defaults are far greater than avoiding typing two characters. This space savings might have slightly more sense when Perl was primarily for quick and dirty work, but it makes no sense when writing anything beyond a few pages of code.
PS shift defaulting to #_ has found a niche in my $self = shift, but I find this only shines because Perl's argument handling is so poor.
The map function takes in a list, not an array. shift takes an array. With lists, on the other hand, #_/#ARGV may or may not be fair defaults.

Perfect Hash Function for Perl (like gperf)?

I'm going to be using a key:value store and would like to create non-collidable hashes in Perl. Is there a Perl module, or function that I can use to generate a non-collidable hash function or table (maybe something like gperf)? I already know my range of input values.
I can't find a pure Perl solution, closest is Reini Urban's examinations of using perfect hashes with a type system. If you were to do it in XS, the CMPH (C Minimal Perfect Hashing Library) might be more apropos than gperf. CMPH seems to be optimized for non-trivial key sizes and run-time generation.
The cost of generating a perfect hash function at runtime in Perl might swamp the value of using it. In order to gain benefit, you'd want it compiled and cached. So again, writing an XS module which generates the function from a fixed key list at XS compile time might be the best way to go.
Out of curiosity, how big is your data and how many keys does the set contain?
You might be interested in Judy. It's not a hash table implementation, but it's supposedly a very efficient associative array implementation.
Mind you, Perl's hashes are very well tuned, and they automatically get rehashed when a bucket starts growing large.

Saving Perl Windows Environment Keys UPCASES them

I have a framework written in Perl that sets a bunch of environment variables to support interprocess (typically it is sub process) communication. We keep a sets of key/value pairs in XML-ish files. We tried to make the key names camel-case somethingLikeThis. This all works well.
Recently we have had occasion to pass control (chain) processes from Windows to UNIX. When we spit out the %ENV hash to a file from Windows the somethingLikeThis key becomes SOMETHINGLIKETHIS. When the Unix process picks up the file and reloads the environment and looks up the value of $ENV{somethingLikeThis} it does not exist since UNIX is case sensitive (from the Windows side the same code works fine).
We have since gone back and changed all the keys to UPPERCASE and solved the problem, but that was tedious and caused pain to the users. Is there a way to make Perl on Windows preserve the character case of the keys of the environment hash?
I believe that you'll find the Windows environment variables are actually case insensitive, thus the keys are uppercase in order to avoid confusion.
This way Windows scripts which don't have any concept of case sensitivity can use the same variables as everything else.
As far as I remember, using ALL_CAPS for environment variables is the recommended practice in both Windows and *NIX worlds. My guess is Perl is just using some kind of legacy API to access the environment, and thus only retrieves the upper-case-only name for the variable.
In any case, you should never rely on something like that, even more so if you are asking your users to set up the variables, just imagine how much aggravation and confusion a simple misspelt variable would produce! You have to remember that some OSes that will remain nameless have not still learned how to do case sensitive files...
First, to solve your problem, I believe using backticks around set and parsing it yourself will work. On my Windows system, this script worked just fine.
my %env = map {/(.*?)=(.*)/;} `set`;
print join(' ', sort keys %env);
In the camel book, the advice in Chapter 25: Portable Perl, the System Interaction section is "Don't depend on a specific environment variable existing in %ENV, and don't assume that anything in %ENV will be case sensitive or case preserving. Don't assume Unix inheritance semantics for environment variables; on some systems, they may be visible to all other processes."
Jack M.: Agreed, it is not a problem on Windows. If I create an environment variable Foo I can reference it in Perl as $ENV{FOO} or $ENV{fOO} or $ENV{foo}. The problem is: I create it as Foo and dump the entire %ENV to a file and then read in the file from *NX to recreate the Environment hash and use the same script to reference $ENV{Foo}, that hash value does not exist (the $ENV{FOO} does exist).
We had adopted the all UPPERCASE workaround that davidg suggested. I was just wondering if there was ANY way to "preserve case" when writing out the keys to the %ENV hash from Perl on Windows.
To the best of my knowledge, there is not. It seems that you may be better off using another hash instead of %ENV. If you are calling many outside modules and want to track the same variables across them, a Factory pattern may work so that you're not breaking DRY, and are able to use a case-sensitive hash across multiple modules. The only trick would then be to keep these variables updated across all objects from the Factory, but I'm sure you can work that out.