Is there any advantage to using keys #array instead of 0 .. $#array? - perl

I was quite surprised to find that the keys function happily works with arrays:
keys HASH
keys ARRAY
keys EXPR
Returns a list consisting of all the keys of the named hash, or the
indices of an array. (In scalar context, returns the number of keys or
indices.)
Is there any benefit in using keys #array instead of 0 .. $#array with respect to memory usage, speed, etc., or are the reasons for this functionality more of a historic origin?
Seeing that keys #array holds up to $[ modification, I'm guessing it's historic :
$ perl -Mstrict -wE 'local $[=4; my #array="a".."z"; say join ",", keys #array;'
Use of assignment to $[ is deprecated at -e line 1.
4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29

Mark has it partly right, I think. What he's missing is that each now works on an array, and, like the each with hashes, each with arrays returns two items on each call. Where each %hash returns key and value, each #array also returns key (index) and value.
while (my ($idx, $val) = each #array)
{
if ($idx > 0 && $array[$idx-1] eq $val)
{
print "Duplicate indexes: ", $idx-1, "/", $idx, "\n";
}
}
Thanks to Zaid for asking, and jmcnamara for bringing it up on perlmonks' CB. I didn't see this before - I've often looped through an array and wanted to know what index I'm at. This is waaaay better than manually manipulating some $i variable created outside of a loop and incremented inside, as I expect that continue, redo, etc., will survive this better.
So, because we can now use each on arrays, we need to be able to reset that iterator, and thus we have keys.

The link you provided actually has one important reason you might use/not use keys:
As a side effect, calling keys() resets the internal interator of the HASH or ARRAY (see each). In particular, calling keys() in void context resets the iterator with no other overhead.
That would cause each to reset to the beginning of the array. Using keys and each with arrays might be important if they ever natively support sparse arrays as a real data-type.
All that said, with so many array-aware language constructs like foreach and join in perl, I can't remember the last time I used 0..$#array.

I actually think you've answered your own question: it returns the valid indices of the array, no matter what value you've set for $[. So from a generality point of view (especially for library usage), it's more preferred.
The version of Perl I have (5.10.1) doesn't support using keys with arrays, so it can't be for historic reasons.

Well in your example, you are putting them in a list; So, in a list context
keys #array will be replaced with all elements of array
whereas 0 .. $#array will do the same but as array slicing; So, instead $array[0 .. $#array] you can also mention $array[0 .. (some specific index)]

Related

Create a Perl hash with an array as the key

How can I put an array (like the tuple in the following example) into a hash in Perl?
%h=();
#a=(1,1);
$h{#a}=1 or $h{\#a}=1??
I tried with an array reference, but it does not work. How do I to make it work? I want to essentially de-duplicate by doing the hashing (among other things with this).
Regular hashes can only have string keys, so you'd need to create some kind of hashing function for your arrays. A simple way would be to simply join your array elements, e.g.
$h{join('-', #a)} = \#a; # A nice readable separator
$h{join($;, #a)} = \#a; # A less likely, configurable separator ("\034")
But that approach (using a sentinel value) requires that you pick a character that won't be found in the keys. The following doesn't suffer from that problem:
$h{pack('(j/a*)*', #a)} = \#a;
Alternatively, check out Hash::MultiKey which can take a more complex key.
I tried with array reference, but it does not work
Funny that, page 361 of the (new) Camel book has a paragraph title:
References Don't Work As Hash Keys
So yes, you proved the Camel book right. It then goes on to tell you how to fix it, using Tie::RefHash.
I guess you should buy the book.
(By the way, (1,1) might be called a tuple in Python, but it is called a list in Perl).
To remove duplicates in the array using hashes:
my %hash;
#hash{#array} = #array;
my #unique = keys %hash;
Alternatively, you can use map to create the hash:
my %hash = map {$_ => 1} #array;

how to check if a hash is empty in perl

I use the following code to check if the hash is empty. Is there a better method and is this safe to use?
if (!keys %hash) { print "Empty";}
if (%hash)
Will work just fine.
From perldoc perldata:
If you evaluate a hash in scalar context, it returns false if the hash
is empty. If there are any key/value pairs, it returns true; more
precisely, the value returned is a string consisting of the number of
used buckets and the number of allocated buckets, separated by a
slash.
There was a bug which caused tied hashes in scalar context to always return false. The bug was fixed in 5.8.5. If you're concerned with backwards compatibility that far back I would stick with if( !keys %hash ). Otherwise use if( !%hash ) as recommended by others.
Simpler:
if (!%hash) {
print "Empty";
}
! imposes a scalar context, and hash evaluated in a scalar context returns:
false if there are zero keys (not defined in the documentation but experimentally returns 0)
Depending on the version of Perl, either of the following:
A string signifying how many used/allocated buckets are used for >0 keys, which will of course be NOT false (e.g. "3/6").
(Non-empty string evaluate to true)
The number of keys in the hash (as explained in perldata: "As of Perl 5.25 the return was changed to be the count of keys in the
hash. If you need access to the old behavior you can use
"Hash::Util::bucket_ratio()" instead.")
"Better" is a subjective term. However I would argue that code that is easier to understand can be described as "better". For this reason I conclude that !keys %hash is better, because everybody writing perl code will know what this code does and that it works. !%hash is something at least I would have to look up to ensure if it really works or only looks like it would work. (The reason being that the return value of a hash in scalar context is rather confusing while an arrays behavior in scalar context is well known and often used.)
Also, !keys %hash is safe.
So no, there is no better or safer way to check if a hash is empty.

Are Perl subroutines call-by-reference or call-by-value?

I'm trying to figure out Perl subroutines and how they work.
From perlsub I understand that subroutines are call-by-reference and that an assignment (like my(#copy) = #_;) is needed to turn them into call-by-value.
In the following, I see that change is called-by-reference because "a" and "b" are changed into "x" and "y". But I'm confused about why the array isn't extended with an extra element "z"?
use strict;
use Data::Dumper;
my #a = ( "a" ,"b" );
change(#a);
print Dumper(\#a);
sub change
{
#_[0] = "x";
#_[1] = "y";
#_[2] = "z";
}
Output:
$VAR1 = [
'x',
'y'
];
In the following, I pass a hash instead of an array. Why isn't the key changed from "a" to "x"?
use strict;
use Data::Dumper;
my %a = ( "a" => "b" );
change(%a);
print Dumper(\%a);
sub change
{
#_[0] = "x";
#_[1] = "y";
}
Output:
$VAR1 = {
'a' => 'y'
};
I know the real solution is to pass the array or hash by reference using \#, but I'd like to understand the behaviour of these programs exactly.
Perl always passes by reference. It's just that sometimes the caller passes temporary scalars.
The first thing you have to realise is that the arguments of subs can be one and only one thing: a list of scalars.* One cannot pass arrays or hashes to them. Arrays and hashes are evaluated, returning a list of their content. That means that
f(#a)
is the same** as
f($a[0], $a[1], $a[2])
Perl passes by reference. Specifically, Perl aliases each of the arguments to the elements of #_. Modifying the elements #_ will change the scalars returned by $a[0], etc. and thus will modify the elements of #a.
The second thing of importance is that the key of an array or hash element determines where the element is stored in the structure. Otherwise, $a[4] and $h{k} would require looking at each element of the array or hash to find the desired value. This means that the keys aren't modifiable. Moving a value requires creating a new element with the new key and deleting the element at the old key.
As such, whenever you get the keys of an array or hash, you get a copy of the keys. Fresh scalars, so to speak.
Back to the question,
f(%h)
is the same** as
f(
my $k1 = "a", $h{a},
my $k2 = "b", $h{b},
my $k2 = "c", $h{c},
)
#_ is still aliased to the values returned by %h, but some of those are just temporary scalars used to hold a key. Changing those will have no lasting effect.
* — Some built-ins (e.g. grep) are more like flow control statements (e.g. while). They have their own parsing rules, and thus aren't limited to the conventional model of a sub.
** — Prototypes can affect how the argument list is evaluated, but it will still result in a list of scalars.
Perl's subroutines accept parameters as flat lists of scalars. An array passed as a parameter is for all practical purposes a flat list too. Even a hash is treated as a flat list of one key followed by one value, followed by one key, etc.
A flat list is not passed as a reference unless you do so explicitly. The fact that modifying $_[0] modifies $a[0] is because the elements of #_ become aliases for the elements passed as parameters. Modifying $_[0] is the same as modifying $a[0] in your example. But while this is approximately similar to the common notion of "pass by reference" as it applies to any programming language, this isn't specifically passing a Perl reference; Perl's references are different (and indeed "reference" is an overloaded term). An alias (in Perl) is a synonym for something, where as a reference is similar to a pointer to something.
As perlsyn states, if you assign to #_ as a whole, you break its alias status. Also note, if you try to modify $_[0], and $_[0] happens to be a literal instead of a variable, you'll get an error. On the other hand, modifying $_[0] does modify the caller's value if it is modifiable. So in example one, changing $_[0] and $_[1] propagates back to #a because each element of #_ is an alias for each element in #a.
Your second example is a little tricky. Hash keys are immutable. Perl doesn't provide a way to modify a hash key, aside from deleting it. That means that $_[0] is not modifiable. When you attempt to modify $_[0] Perl cannot comply with that request. It probably ought to throw a warning, but doesn't. You see, the flat list passed to it consists of unmodifiable-key followed by modifiable-value, etc. This is mostly a non-issue. I cannot think of any reason to modify individual elements of a hash in the way you're demonstrating; since hashes have no particular order you wouldn't have simple control over which elements in #_ propagate back to which values in %a.
As you pointed out, the proper protocol is to pass \#a or \%a, so that they can be referred to as $_[0]->{element} or $_[0]->[0]. Even though the notation is a little more complicated, it becomes second nature after awhile, and is much clearer (in my opinion) as to what is going on.
Be sure to have a look at the perlsub documentation. In particular:
Any arguments passed in show up in the array #_. Therefore, if you called a function with two arguments, those would be stored in $_[0] and $_[1]. The array #_ is a local array, but its elements are aliases for the actual scalar parameters. In particular, if an element $_[0] is updated, the corresponding argument is updated (or an error occurs if it is not updatable). If an argument is an array or hash element which did not exist when the function was called, that element is created only when (and if) it is modified or a reference to it is taken. (Some earlier versions of Perl created the element whether or not the element was assigned to.) Assigning to the whole array #_ removes that aliasing, and does not update any arguments.
(Note that use warnings is even more important than use strict.)
#_ itself isn't a reference to anything, it is an array (really, just a view of the stack, though if you do something like take a reference to it, it morphs into a real array) whose elements each are an alias to a passed parameter. And those passed parameters are the individual scalars passed; there is no concept of passing an array or hash (though you can pass a reference to one).
So shifts, splices, additional elements added, etc. to #_ don't affect anything passed, though they may change the index of or remove from the array one of the original aliases.
So where you call change(#a), this puts two aliases on the stack, one to $a[0] and one to $a[1]. change(%a) is more complicated; %a flattens out into an alternating list of keys and values, where the values are the actual hash values and modifying them modifies what's stored in the hash, but where the keys are merely copies, no longer associated with the hash.
Perl does not pass the array or hash itself by reference, it unfurls the entries (the array elements, or the hash keys and values) into a list and passes this list to the function. #_ then allows you to access the scalars as references.
This is roughly the same as writing:
#a = (1, 2, 3);
$b = \$a[2];
${$b} = 4;
#a now [1, 2, 4];
You'll note that in the first case you were not able to add an extra item to #a, all that happened was that you modified the members of #a that already existed. In the second case, the hash keys don't really exist in the hash as scalars, so these need to be created as copies in temporary scalars when the expanded list of the hash is created to be passed into the function. Modifying this temporary scalar will not modify the hash key, as it is not the hash key.
If you want to modify an array or hash in a function, you will need to pass a reference to the container:
change(\%foo);
sub change {
$_[0]->{a} = 1;
}
Firstly, you are confusing the # sigil as indicating an array. This is actually a list. When you call Change(#a) you are passing the list to the function, not an array object.
The case with the hash is slightly different. Perl evaluates your call into a list and passes the values as a list instead.

What is the difference between `$this`, `#that`, and `%those` in Perl?

What is the difference between $this, #that, and %those in Perl?
A useful mnemonic for Perl sigils are:
$calar
#rray
%ash
Matt Trout wrote a great comment on blog.fogus.me about Perl sigils which I think is useful so have pasted below:
Actually, perl sigils don’t denote variable type – they denote conjugation – $ is ‘the’, # is
‘these’, % is ‘map of’ or so – variable type is denoted via [] or {}. You can see this with:
my $foo = 'foo';
my #foo = ('zero', 'one', 'two');
my $second_foo = $foo[1];
my #first_and_third_foos = #foo[0,2];
my %foo = (key1 => 'value1', key2 => 'value2', key3 => 'value3');
my $key2_foo = $foo{key2};
my ($key1_foo, $key3_foo) = #foo{'key1','key3'};
so looking at the sigil when skimming perl code tells you what you’re going to -get- rather
than what you’re operating on, pretty much.
This is, admittedly, really confusing until you get used to it, but once you -are- used to it
it can be an extremely useful tool for absorbing information while skimming code.
You’re still perfectly entitled to hate it, of course, but it’s an interesting concept and I
figure you might prefer to hate what’s -actually- going on rather than what you thought was
going on :)
$this is a scalar value, it holds 1 item like apple
#that is an array of values, it holds several like ("apple", "orange", "pear")
%those is a hash of values, it holds key value pairs like ("apple" => "red", "orange" => "orange", "pear" => "yellow")
See perlintro for more on Perl variable types.
Perl's inventor was a linguist, and he sought to make Perl like a "natural language".
From this post:
Disambiguation by number, case and word order
Part of the reason a language can get away with certain local ambiguities is that other ambiguities are suppressed by various mechanisms. English uses number and word order, with vestiges of a case system in the pronouns: "The man looked at the men, and they looked back at him." It's perfectly clear in that sentence who is doing what to whom. Similarly, Perl has number markers on its nouns; that is, $dog is one pooch, and #dog is (potentially) many. So $ and # are a little like "this" and "these" in English. [emphasis added]
People often try to tie sigils to variable types, but they are only loosely related. It's a topic we hit very hard in Learning Perl and Effective Perl Programming because it's much easier to understand Perl when you understand sigils.
Many people forget that variables and data are actually separate things. Variables can store data, but you don't need variables to use data.
The $ denotes a single scalar value (not necessarily a scalar variable):
$scalar_var
$array[1]
$hash{key}
The # denotes multiple values. That could be the array as a whole, a slice, or a dereference:
#array;
#array[1,2]
#hash{qw(key1 key2)}
#{ func_returning_array_ref };
The % denotes pairs (keys and values), which might be a hash variable or a dereference:
%hash
%$hash_ref
Under Perl v5.20, the % can now denote a key/value slice or either a hash or array:
%array[ #indices ]; # returns pairs of indices and elements
%hash{ #keys }; # returns pairs of key-values for those keys
You might want to look at the perlintro and perlsyn documents in order to really get started with understanding Perl (i.e., Read The Flipping Manual). :-)
That said:
$this is a scalar, which can store a number (int or float), a string, or a reference (see below);
#that is an array, which can store an ordered list of scalars (see above). You can add a scalar to an array with the push or unshift functions (see perlfunc), and you can use a parentheses-bounded comma-separated list of scalar literals or variables to create an array literal (i.e., my #array = ($a, $b, 6, "seven");)
%those is a hash, which is an associative array. Hashes have key-value pairs of entries, such that you can access the value of a hash by supplying its key. Hash literals can also be specified much like lists, except that every odd entry is a key and every even one is a value. You can also use a => character instead of a comma to separate a key and a value. (i.e., my %ordinals = ("one" => "first", "two" => "second");)
Normally, when you pass arrays or hashes to subroutine calls, the individual lists are flattened into one long list. This is sometimes desirable, sometimes not. In the latter case, you can use references to pass a reference to an entire list as a single scalar argument. The syntax and semantics of references are tricky, though, and fall beyond the scope of this answer. If you want to check it out, though, see perlref.

How can I access the last Perl hash key without using a temporary array?

How can I access the last element of keys in a hash without having to create a temporary array?
I know that hashes are unordered. However, there are applications (like mine), in which my keys can be ordered using a simple sort call on the hash keys. Hope I've explained why I wanted this. The barney/elmo example is a bad choice, I admit, but it does have its applications.
Consider the following:
my %hash = ( barney => 'dinosaur', elmo => 'monster' );
my #array = sort keys %hash;
print $array[$#{$hash}];
#prints "elmo"
Any ideas on how to do this without calling on a temp (#array in this case)?
print( (keys %hash)[-1]);
Note that the extra parens are necessary to prevent syntax confusion with print's param list.
You can also use this evil trick to force it into scalar context and do away with the extra parens:
print ~~(keys %hash)[-1];
Generally, assuming you want last, sorted alphabetically, it's simple:
use List::Util qw( maxstr );
print maxstr(keys %hash);
If you'd prefer not to use module (which I don't see valid reason for, but there are people who like to make it harder):
print( (sort keys %hash)[-1] );
Hashes are unordered, so there is no such thing as the "last element." The functions for iterating over a hash (keys, values, and each) have an order, but it's not anything that you should rely on.
Technically speaking, hashes have a "hash order" which is what the iterators use. Hash order is dependent on the hashing algorithm, which can change (and has) between different versions of Perl. Moreover, as of version 5.8.1 Perl contains hash randomization features that can change the hashing algorithm in order to prevent certain types of attacks.
In general, if you care about order you should be using an array instead.
According to perldoc perldata:
Hashes are unordered collections of
scalar values indexed by their
associated string key.
Since hash are unordered. So, sorry. There are no "last" element.
To make everyone else's point more clear, a hash's keys in Perl will have the same order every time you call keys, values, or each within the same process's lifetime, assuming the hash has not been modified. From perlfunc:
The keys are returned in an apparently random order. The actual random order is subject to change in future versions of perl, but it is guaranteed to be the same order as either the values or each function produces (given that the hash has not been modified). Since Perl 5.8.1 the ordering is different even between different runs of Perl for security reasons (see "Algorithmic Complexity Attacks" in perlsec).
$h{'11c'} = 'C';
$h{'b'} = 'B';
$h{'e22'} = 'E';
$h{'aaaaa'} = 'AAAA';
for (keys %h){
$a = \$h{$_} and $b = $_ if $a < \$h{$_};
}
print "$b\n";
!
but be carefull due to the obvious causes
Hashes are unordered element. so might be your hash last element is elmo /