Use of uninitialized value $_ in hash element - perl

This is regarding a warning message I received when running a Perl script.
I understand why I'm receiving this warning: probably because $element is undefined when being called but I don't see it.
for ( my $element->{$_}; #previous_company_names; ) {
map { $element => $previous_company_names->{$_} }
0 .. $previous_company_names;
The result is this message
Use of uninitialized value $_ in hash element

First and foremost - for a new programmer, absolutely the most important thing you must do, is use strict; and use warnings;. You've got my in there, which suggests you might be, but it pays to re-iterate it.
$_ is a special variable, called the implicit variable. It doesn't really make sense to use it in the way you're doing like that, in a for loop. Take a look at perlvar for some more detail.
Indeed, I'd suggest steering clear of map entirely until you really grok it, because it's a good way to confuse yourself.
With a for (or foreach) loop you can either:
for my $thing ( #list_of_things ) {
print $thing;
}
Or you can do:
for ( #list_of_things ) {
print $_;
}
$_ is set implicitly by each iteration of the second loop, which can be quite useful because lots of things default to using it.
E.g.
for ( #list_of_things ) {
chomp;
s/ /_/g;
print;
}
When it comes to map - map is a clever little function, that lets you evaluate a code block for each element in a list. Personally - I still get confused by it, and tend to stick with for or foreach loops instead, most of the time.
But what you're doing with it, isn't really going to work - map makes a hash.
So something like:
use Data::Dumper;
my %things = map { $_ => 1 } 1..5;
print Dumper \%things;
This creates the hash 'things':
$VAR1 = {
'1' => 1,
'3' => 1,
'5' => 1,
'4' => 1,
'2' => 1
};
Again, $_ is used inside, because it's the magic variable - it's set to 'whatever was in the second bit' (e.g 1,2,3,4,5) each loop, and then the block is evaluated.
So your map expression doesn't really make a lot of sense, because you don't have $element defined... and even if you did, you'd repeatedly overwrite it.
I would also note - $previous_company_names would need to be numeric, and is in NO way related to #previous_company_names. You might be meaning to use $#previous_company_names which is the last element index.

Related

Why does combining Perl hashes in and each expression not work?

I recently encountered this and became most displeased:
while (my ($key, $value) = each (%hash1, %hash2)) {
}
Gave this error: Experimental each on scalar is now forbidden at ...
But this, which seems to be the same operation using a superfluous variable:
my %h = (%hash1, %hash2);
while (my ($key, $value) = each %h) {
}
Compiled and worked just fine.
What's the reason for this, and is my displeasure warranted?
There are a couple of issues that come up here. First, let's deal with your immediate problem.
my %h = (%hash1, %hash2);
while (my ($key, $value) = each(%hash1, %hash2)) { ... }
each is actually an ordinary function in Perl, not a special syntax or something. So, as far as Perl is concerned, you're calling each with two arguments, not one. each expects either an array or hash value (basically, something that begins with % or #), which is why each(%h) works. You can create a local hash and pass that using a bit more convoluted syntax
while (my ($key, $value) = each(%{{%hash1, %hash2}})) { ... }
Here, we use the hashref constructor to make a new hash {%hash1, %hash2}. This is a scalar value that happens to point to a hash. Then we immediately dereference it with %{...}. Unfortunately, this causes another problem. If you try to run this code, it'll compile fine but then infinitely loop forever. To see why this is, we'll need to take a brief tangent.
each is a bit of an oddball in Perl. It's actually stateful and stores the so-called state of its call in the hash object. So
my %h = (a => 1, b => 2);
say each(%h);
say each(%h);
These two calls to each will return different values. One will return ("a", 1) and the other will return ("b", 2) (the order of the two returns is unspecified).
Now, your while condition is going to run anew every time it loops, so if we create a temporary hash at every loop iteration, and Perl is trying to store its each state in the hash every time, then you'll never reach the end of iteration since you'll never iterate more than once on any given hash before it's erased and replaced with a new one.
My recommendation is to just use the temporary. Even if you could do it with each and the merged hash, you'd be making a new merged hash at every loop iteration. Alternatively, you can use keys to simply get all of the keys as a single list. This will only happen once, since it's happening as the head of a for loop.
for my $key (keys %{{%hash1, %hash2}}) {
my $value = $hash1{$key} // $hash2{$key};
...
}
Syntax for each:
each HASH
each ARRAY
This means
each %NAME
each %BLOCK
each EXPR->%*
each #NAME
each #BLOCK
each EXPR->#*
What you have does not match any of those patterns.
For a while, there was an experimental feature that allowed one to use
each EXPR
as long as the expression returned a reference to a hash or array.
The experiment was a failure, so this is no longer allowed. But your code wouldn't work even in a version of Perl with this feature. Your expression (%hash1, %hash2 in scalar context) returns the size of %hash2 or a weird string (depending on the version of Perl), and neither of those is a reference to a hash or a reference to an array.
Now, you might be tempted to use
each %{ { %hash1, %hash2 } }
Unfortunately, that creates a new hash each time it's evaluated, so you will perpetually get the first element of this new hash.

Why does this map block contain an apparently useless +?

While browsing the source code I saw the following lines:
my #files_to_keep = qw (file1 file2);
my %keep = map { + $_ => 1 } #files_to_keep;
What does the + do in this code snippet? I used Data::Dumper to see whether taking out the plus sign does anything, but the results were the same:
$ perl cleanme.pl
$VAR1 = {
'file1' => 1,
'file2' => 1
};
This is used to prevent a parsing problem. The plus symbol forces the interpreter to behave like a normal block and not an expression.
The fear is that perhaps you are trying to create a hashreference using the other (expression) formulation of map like so.
#array_of_hashrefs = map { "\L$_" => 1 }, #array
Notice the comma. Then if the parser guesses that you are doing this given the statement in the OP there will a syntax error for missing the comma! To see the difference try quoting "$_". For whatever reason, the parser takes this as enough to trigger the expression behavior.
Yes its an oddity. Therefore many extra-paranoid Perl programmers toss in the extra plus sign more often than needed (me included).
Here are the examples from the map documentation.
%hash = map { "\L$_" => 1 } #array # perl guesses EXPR. wrong
%hash = map { +"\L$_" => 1 } #array # perl guesses BLOCK. right
%hash = map { ("\L$_" => 1) } #array # this also works
%hash = map { lc($_) => 1 } #array # as does this.
%hash = map +( lc($_) => 1 ), #array # this is EXPR and works!
%hash = map ( lc($_), 1 ), #array # evaluates to (1, #array)
For a fun read (stylistically) and a case where the parser gets it wrong read this: http://blogs.perl.org/users/tom_wyant/2012/01/the-case-of-the-overloaded-curlys.html
The unary-plus operator simply returns its operand unchanged. Adding one doesn't even change the context.
In the example you gave, it is completely useless. But there are situations where it is useful to make the next token something that's undeniably an operator.
For example, map has two syntaxes.
map EXPR, LIST
and
map BLOCK LIST
A block starts with {, but so can an expression. For example, { } can be a block or a hash constructor.
So how can map tell the difference? It guesses. Which means it's sometimes wrong.
One occasion where is guesses wrong is the following:
map { $_ => 1 }, #list
You can prod it in to guessing correctly using + or ;.
map {; ... # BLOCK
map +{ ... # EXPR
So in this case, you could use
map +{ foo => $_ }, #list
Note that you could also use the following:
map({ foo => $_ }, #list)
Another example is when you omit the parens around arguments, and the first argument expression starts with a paren.
print ($x+$y)*2; # Same as: 2 * print($x+$y)
It can be fixed using
print +($x+$y)*2;
But why pile on a hack just to avoid parens? I prefer
print(($x+$y)*2);

Perl throws an error message about syntax

So, building off a question about string matching (this thread), I am working on implementing that info in solution 3 into a working solution to the problem I am working on.
However, I am getting errors, specifically about this line of the below function:
next if #$args->{search_in} !~ /#$cur[1]/;
syntax error at ./db_index.pl line 16, near "next "
My question as a perl newbie is what am I doing wrong here?
sub search_for_key
{
my ($args) = #_;
foreach $row(#{$args->{search_ary}}){
print "#$row[0] : #$row[1]\n";
}
my $thiskey = NULL;
foreach $cur (#{$args->{search_ary}}){
print "\n" . #$cur[1] . "\n"
next if #$args->{search_in} !~ /#$cur[1]/;
$thiskey = #$cur[0];
last;
}
return $thiskey;
}
You left off the semicolon at the end of the previous line. That's what caused the syntax error, anyway. I think you're also misusing $args, but it's hard to be sure about that without knowing how you're calling this function.
There are several issues here.
Are you adding use strict; and use warnings; at the top of your script before you do anything else? You only posted the sub, but it is clear that you are not using these.
What is NULL? (strict will not let you use bare-words...) Be sure to read What is Truth in Perl? The more Perly way is to deal with "truth" or "false" is defined / undef or exists or specifically test for a value chosen as a convention.
Missing ; after print "\n" . #$cur[1] . "\n"
Your data structures seem way too complicated. From what I can tell, you are passing a reference to a hash of arrays, true? Why your data structures get really obscure, back up and look at what you are trying to do...
Perl gives you plenty of way to shoot yourself in the foot. It is not strictly typed and you will do yourself (and your readers) a favor by naming references as a derivative of what they refer to. So instead of $args use $ref2HoArefs for example.
Side note, are you sure you can't just use a hash for what you're doing? It seems awfully complicated do do something so simple:
my %hash = (
key1 => 'value1',
key2 => 'value2',
);
exists $hash{$search_in}; # true/false.
my $result = $hash{$search_in}; # returns 'value1' when $search_in is 'key1'
Or if you need to search by value:
my %flip = reverse %hash;
$result = $flip{$search_in};
And if you really need a regex key ( or value ) lookup:
sub string_match {
my ($lookup_hash, $key ) = #_;
for my $hash_key ( %{ $lookup_hash } ){
return $hash_key if $key =~ $lookup_hash->{$hash_key};
}
return; # not found.
}
my $k = string_match({
'whitespace at end' => qr/\s+$/,
'whitespace at start' => qr/^\s+/,
}, "Some Garbage string "); # k == whitespace at end

How would I do the equivalent of Prototype's Enumerator.detect in Perl with the least amount of code?

Lately I've been thinking a lot about functional programming. Perl offers quite a few tools to go that way, however there's something I haven't been able to find yet.
Prototype has the function detect for enumerators, the descriptions is simply this:
Enumerator.detect(iterator[, context]) -> firstElement | undefined
Finds the first element for which the iterator returns true.
Enumerator in this case is any list while iterator is a reference to a function, which is applied in turn on each element of the list.
I am looking for something like this to apply in situations where performance is important, i.e. when stopping upon encountering a match saves time by disregarding the rest of the list.
I am also looking for a solution that would not involve loading any extra module, so if possible it should be done with builtins only. And if possible, it should be as concise as this for example:
my #result = map function #array;
You say you don't want a module, but this is exactly what the first function in List::Util does. That's a core module, so it should be available everywhere.
use List::Util qw(first);
my $first = first { some condition } #array;
If you insist on not using a module, you could copy the implementation out of List::Util. If somebody knew a faster way to do it, it would be in there. (Note that List::Util includes an XS implementation, so that's probably faster than any pure-Perl approach. It also has a pure-Perl version of first, in List::Util::PP.)
Note that the value being tested is passed to the subroutine in $_ and not as a parameter. This is a convenience when you're using the first { some condition} #values form, but is something you have to remember if you're using a regular subroutine. Some more examples:
use 5.010; # I want to use 'say'; nothing else here is 5.10 specific
use List::Util qw(first);
say first { $_ > 3 } 1 .. 10; # prints 4
sub wanted { $_ > 4 }; # note we're using $_ not $_[0]
say first \&wanted, 1 .. 10; # prints 5
my $want = \&wanted; # Get a subroutine reference
say first \&$want, 1 .. 10; # This is how you pass a reference in a scalar
# someFunc expects a parameter instead of looking at $_
say first { someFunc($_) } 1 .. 10;
Untested since I don't have Perl on this machine, but:
sub first(\&#) {
my $pred = shift;
die "First argument to "first" must be a sub" unless ref $pred eq 'CODE';
for my $val (#_) {
return $val if $pred->($val);
}
return undef;
}
Then use it as:
my $first = first { sub performing test } #list;
Note that this doesn't distinguish between no matches in the list and one of the elements in the list being an undefined value and having that match.
Just since its not here, a Perl function definition of first that localizes $_ for its block:
sub first (&#) {
my $code = shift;
for (#_) {return $_ if $code->()}
undef
}
my #array = 1 .. 10;
say first {$_ > 5} #array; # prints 6
While it will work fine, I don't advocate using this version, since List::Util is a core module (installed by default), and its implementation of first will usually use the XS version (written in C) which is much faster.

What's the safest way to iterate through the keys of a Perl hash?

If I have a Perl hash with a bunch of (key, value) pairs, what is the preferred method of iterating through all the keys? I have heard that using each may in some way have unintended side effects. So, is that true, and is one of the two following methods best, or is there a better way?
# Method 1
while (my ($key, $value) = each(%hash)) {
# Something
}
# Method 2
foreach my $key (keys(%hash)) {
# Something
}
The rule of thumb is to use the function most suited to your needs.
If you just want the keys and do not plan to ever read any of the values, use keys():
foreach my $key (keys %hash) { ... }
If you just want the values, use values():
foreach my $val (values %hash) { ... }
If you need the keys and the values, use each():
keys %hash; # reset the internal iterator so a prior each() doesn't affect the loop
while(my($k, $v) = each %hash) { ... }
If you plan to change the keys of the hash in any way except for deleting the current key during the iteration, then you must not use each(). For example, this code to create a new set of uppercase keys with doubled values works fine using keys():
%h = (a => 1, b => 2);
foreach my $k (keys %h)
{
$h{uc $k} = $h{$k} * 2;
}
producing the expected resulting hash:
(a => 1, A => 2, b => 2, B => 4)
But using each() to do the same thing:
%h = (a => 1, b => 2);
keys %h;
while(my($k, $v) = each %h)
{
$h{uc $k} = $h{$k} * 2; # BAD IDEA!
}
produces incorrect results in hard-to-predict ways. For example:
(a => 1, A => 2, b => 2, B => 8)
This, however, is safe:
keys %h;
while(my($k, $v) = each %h)
{
if(...)
{
delete $h{$k}; # This is safe
}
}
All of this is described in the perl documentation:
% perldoc -f keys
% perldoc -f each
One thing you should be aware of when using each is that it has
the side effect of adding "state" to your hash (the hash has to remember
what the "next" key is). When using code like the snippets posted above,
which iterate over the whole hash in one go, this is usually not a
problem. However, you will run into hard to track down problems (I speak from
experience ;), when using each together with statements like
last or return to exit from the while ... each loop before you
have processed all keys.
In this case, the hash will remember which keys it has already returned, and
when you use each on it the next time (maybe in a totaly unrelated piece of
code), it will continue at this position.
Example:
my %hash = ( foo => 1, bar => 2, baz => 3, quux => 4 );
# find key 'baz'
while ( my ($k, $v) = each %hash ) {
print "found key $k\n";
last if $k eq 'baz'; # found it!
}
# later ...
print "the hash contains:\n";
# iterate over all keys:
while ( my ($k, $v) = each %hash ) {
print "$k => $v\n";
}
This prints:
found key bar
found key baz
the hash contains:
quux => 4
foo => 1
What happened to keys "bar" and baz"? They're still there, but the
second each starts where the first one left off, and stops when it reaches the end of the hash, so we never see them in the second loop.
The place where each can cause you problems is that it's a true, non-scoped iterator. By way of example:
while ( my ($key,$val) = each %a_hash ) {
print "$key => $val\n";
last if $val; #exits loop when $val is true
}
# but "each" hasn't reset!!
while ( my ($key,$val) = each %a_hash ) {
# continues where the last loop left off
print "$key => $val\n";
}
If you need to be sure that each gets all the keys and values, you need to make sure you use keys or values first (as that resets the iterator). See the documentation for each.
Using the each syntax will prevent the entire set of keys from being generated at once. This can be important if you're using a tie-ed hash to a database with millions of rows. You don't want to generate the entire list of keys all at once and exhaust your physical memory. In this case each serves as an iterator whereas keys actually generates the entire array before the loop starts.
So, the only place "each" is of real use is when the hash is very large (compared to the memory available). That is only likely to happen when the hash itself doesn't live in memory itself unless you're programming a handheld data collection device or something with small memory.
If memory is not an issue, usually the map or keys paradigm is the more prevelant and easier to read paradigm.
A few miscellaneous thoughts on this topic:
There is nothing unsafe about any of the hash iterators themselves. What is unsafe is modifying the keys of a hash while you're iterating over it. (It's perfectly safe to modify the values.) The only potential side-effect I can think of is that values returns aliases which means that modifying them will modify the contents of the hash. This is by design but may not be what you want in some circumstances.
John's accepted answer is good with one exception: the documentation is clear that it is not safe to add keys while iterating over a hash. It may work for some data sets but will fail for others depending on the hash order.
As already noted, it is safe to delete the last key returned by each. This is not true for keys as each is an iterator while keys returns a list.
I always use method 2 as well. The only benefit of using each is if you're just reading (rather than re-assigning) the value of the hash entry, you're not constantly de-referencing the hash.
I may get bitten by this one but I think that it's personal preference. I can't find any reference in the docs to each() being different than keys() or values() (other than the obvious "they return different things" answer. In fact the docs state the use the same iterator and they all return actual list values instead of copies of them, and that modifying the hash while iterating over it using any call is bad.
All that said, I almost always use keys() because to me it is usually more self documenting to access the key's value via the hash itself. I occasionally use values() when the value is a reference to a large structure and the key to the hash was already stored in the structure, at which point the key is redundant and I don't need it. I think I've used each() 2 times in 10 years of Perl programming and it was probably the wrong choice both times =)
I usually use keys and I can't think of the last time I used or read a use of each.
Don't forget about map, depending on what you're doing in the loop!
map { print "$_ => $hash{$_}\n" } keys %hash;
I woudl say:
Use whatever's easiest to read/understand for most people (so keys, usually, I'd argue)
Use whatever you decide consistently throught the whole code base.
This give 2 major advantages:
It's easier to spot "common" code so you can re-factor into functions/methiods.
It's easier for future developers to maintain.
I don't think it's more expensive to use keys over each, so no need for two different constructs for the same thing in your code.