*why* does list assignment flatten its left hand side? - variable-assignment

I understand that list assignment flattens its left hand side:
my ($a, $b, $c);
($a, ($b, $c)) = (0, (1.0, 1.1), 2);
say "\$a: $a"; # OUTPUT: «$a: 0»
say "\$b: $b"; # OUTPUT: «$b: 1 1.1» <-- $b is *not* 1
say "\$c: $c"; # OUTPUT: «$c: 2» <-- $c is *not* 1.1
I also understand that we can use :($a, ($b, $c)) := (0, (1.0, 1.1)) to get the non-flattening behavior.
What I don't understand is why the left hand side is flattened during list assignment. And that's kind of two questions: First, how does this flattening behavior fit in with the rest of the language? Second, does auto-flattening allow any behavior that would be impossible if the left hand side were non-flattening?
On the first question, I know that Raku historically had a lot of auto-flattening behavior. Before the Great List Refactor, an expression like my #a = 1, (2, 3), 4 would auto-flatten its right hand side, resulting in the Array [1, 2, 3, 4]; similarly, map and many other iterating constructs would flatten their arguments. Post-GLR, though, Raku basically never flattens a list without being told to. In fact, I can't think of any other situation where Raku flattens without flat, .flat, |, or *# being involved somehow. (#_ creates an implicit *#). Am I missing something, or is the LHS behavior in list assignment really inconsistent with post-GLR semantics? Is this behavior a historical oddity, or does it still make sense?
With respect to my second question, I suspect that the flattening behavior of list assignment may somehow help support for laziness. For example, I know that we can use list assignment to consume certain values from a lazy list without producing/calculating them all – whereas using := with a list will need to calculate all of the RHS values. But I'm not sure if/how auto-flattening the LHS is required to support this behavior.
I also wonder if the auto-flattening has something to do with the fact that = can be passed to meta operators – unlike :=, which generates a "too fiddly" error if used with a metaoperator. But I don't know how/if auto-flattening makes = less "fiddly".
[edit: I've found IRC references to the "(GLR-preserved) decision that list assignment is flattening" as early as early as 2015-05-02, so it's clear that this decision was intentional and well-justified. But, so far, I haven't found that justification and suspect that it may have been decided at in-person meetings. So I'm hopping someone knows.]
Finally, I also wonder how the LHS is flattened, at a conceptual level. (I don't mean in the Rakudo implementation specifically; I mean as a mental model). Here's how I'd been thinking about binding versus list assignment:
my ($a, :$b) := (4, :a(2)); # Conceptually similar to calling .Capture on the RHS
my ($c, $d, $e);
($c, ($d, $e) = (0, 1, 2); # Conceptually similar to calling flat on the LHS
Except that actually calling .Capture on the RHS in line 1 works, whereas calling flat on the LHS in line 3 throws a Cannot modify an immutable Seq error – which I find very confusing, given that we flatten Seqs all the time. So is there a better mental model for thinking about this auto-flattening behavior?
Thanks in advance for any help. I'm trying to understand this better as part of my work to improve the related docs, so any insight you can provide would support that effort.

Somehow, answering the questions parts in the opposite order felt more natural to me. :-)
Second, does auto-flattening allow any behavior that would be impossible if the left hand side were non-flattening?
It's relatively common to want to assign the first (or first few) items of a list into scalars and have the rest placed into an array. List assignment descending into iterables on the left is what makes this work:
my ($first, $second, #rest) = 1..5;
.say for $first, $second, #rest;'
The output being:
1
2
[3 4 5]
With binding, which respects structure, it would instead be more like:
my ($first, $second, *#rest) := |(1..5);
First, how does this flattening behavior fit in with the rest of the language?
In general, operations where structure would not have meaning flatten it away. For example:
# Process arguments
my $proc = Proc::Async.new($program, #some-args, #some-others);
# Promise combinators
await Promise.anyof(#downloads, #uploads);
# File names
unlink #temps, #previous-output;
# Hash construction
my #a = x => 1, y => 2;
my #b = z => 3;
dd hash #a, #b; # {:x(1), :y(2), :z(3)}
List assignment could, of course, have been defined in a structure-respecting way instead. These things tend to happen for multiple reasons, but for one but the language already has binding for when you do want to do structured things, and for another my ($first, #rest) = #all is just a bit too common to send folks wanting it down the binding/slurpy power tool path.

Related

What is the most efficient operator to compare any two items?

Frequently I need to convert data from one type to another and then compare them. Some operators will convert to specific types first and this conversion may cause loss of efficiency. For instance, I may have
my $a, $b = 0, "foo"; # initial value
$a = (3,4,5).Set; # re-assign value
$b = open "dataFile"; # re-assign value
if $a eq $b { say "okay"; } # convert to string
if $a == 5 { say "yes"; } # convert to number
if $a == $b {} # error, Cannot resolve caller Numeric(IO::Handle:D: );
The operators "eq" and "==" will convert data to the digestible types first and may slow things down. Will the operators "eqv" and "===" skip converting data types and be more efficient if data to be compared cannot be known in advance (i.e., you absolutely have no clue what you are going to get in advance)?
It's not quite clear to me from the question if you actually want the conversions to happen or not.
Operators like == and eq are really calls to multi subs with names like infix:<==>, and there are many candidates. For example, there's one for (Int, Int), which is selected if we're comparing two Ints. In that case, it knows that it doesn't need to coerce, and will just do the integer comparison.
The eqv and === operators will not coerce; the first thing they do is to check that the values have the same type, and if they don't, they go no further. Make sure to use the correct one depending of if you want snapshot semantics (eqv) or reference semantics (===). Note that the types really must be the exact same, so 1e0 === 1 will not come out true because the one value is a Num and the other an Int.
The auto-coercion behavior of operators like == and eq can be really handy, but from a performance angle it can also be a trap. They coerce, use the result of the coercion for the comparison, and then throw it away. Repeatedly doing comparisons can thus repeatedly trigger coercions. If you have that situation, it makes sense to split the work into two phases: first "parse" the incoming data into the appropriate data types, and then go ahead and do the comparisons.
Finally, in any discussion on efficiency, it's worth noting that the runtime optimizer is good at lifting out duplicate type checks. Thus while in principle, if you read the built-ins source, == might seem cheaper in the case it gets two things have the same type because it isn't enforcing they are precisely the same type, in reality that extra check will get optimized out for === anyway.
Both === and eqv first check whether the operands are of the same type, and will return False if they are not. So at that stage, there is no real difference between them.
The a === b operator is really short for a.WHICH eq b.WHICH. So it would call the .WHICH method on the operands, which could be expensive if an operand is something like a really large Buf.
The a eqv b operator is more complicated in that it has special cased many object comparisons, so in general you cannot say much about it.
In other words: YMMV. And if you're really interested in performance, benchmark! And be prepared to adapt your code if another way of solving the problem proves more performant.

Concerns with concatenating strings and ints

I have taken a principles of programming class and have been given a Perl expression that concatenates a string number to an int number and then adds another number to it and it evaluates fine. i.e. ("4" . 3) + 7 == 50.
I'm trying to understand why Perl does this and what concerns it may bring up. I'm having a hard time grasping many of the concepts of the class and am trying to get explanations from different sources apart from my horrible teacher and equally horrible notes.
Can the concept behind this kind of expression be explained to me as well as concerns they might bring up? Thanks in advance for the help.
Edit: For Clarity
Perl is built around the central concept of 'do what I mean'.
A scalar is a multi purpose variable type, and is intended to implicitly cast values to a data type that's appropriate to what you're doing.
The reason this works is because perl is context sensitive - it knows the difference between different expected return values.
At a basic level, you can see this with the wantarray function. (Which as noted below - is probably badly named, because we're talking about a LIST context)
sub context_test {
if ( not defined wantarray() ) {
print "Void context\n";
}
if ( wantarray() ) {
return ( "List", "Context" );
}
else {
return "scalar context";
}
}
context_test();
my $scalar = context_test();
my #list = context_test();
print "Scalar context gave me $scalar\n";
print "List context gave me #list\n";
This principle occurs throughout perl. If you want, you can use something like Contextual::Return to extend this further - testing the difference between numeric, string and boolean subsets of scalar contexts.
The reason I mention this is because a scalar is a special sort of data type - if you look at Scalar::Util you will see a capability of creating a dualvar - a scalar that has different values in different contexts.
my $dualvar = Scalar::Util::dualvar ( 666, "the beast" );
print "Numeric:",$dualvar + 0,"\n";
print "String:",$dualvar . '',"\n";
Now, messing around with dualvars is a good way to create some really annoying and hard to trace bugs, but the point is - a scalar is a magic datatype, and perl is always aware of what you're doing with the result.
If you perform a string operation, perl treats it as a string. If you perform a numeric operation, perl tries to treat it as a number.
my $value = '4'; #string;
my $newvalue = $value . 3; #because we concat, perl treats _both_ as strings.
print $newvalue,"\n";
my $sum = $newvalue + 7; #perl turns strings back to numbers, because we're adding;
print $sum,"\n";
if ( Scalar::Util::isdual ( $newvalue ) ) { print "newvalue Is a dual var\n" };
if ( not Scalar::Util::isdual ( $sum ) ) { print "sum is NOT a dual var\n"; };
Mostly 'context' is something that happens behind the scenes in perl, and you don't have to worry about it. If you've come from a programming background, the idea of implicit casting between int and string may seem a little bit dirty. But it mostly works fine.
You may occasionally get errors like:
Argument "4a3" isn't numeric in addition (+)
One of the downsides of this approach is these are runtime errors, because you're not doing strong type checking at 'compile' time.
So in terms of specific concerns:
You're runtime type checking, not compile time. If you have strict types, you can detect an attempt to add a string to an int before you start to run anything.
You're not always operating in the context that you assume you are, which can lead to some unpredictable behaviour. One of the best examples is that print operates in a list context - so to take the example above:
print context_test();
You'll get List Context.
If you monkey around with context sensitive return types, you can create some really annoying bugs that are immensely irritating to back trace and troubleshoot.

In Perl, on what basis should one choose between passing a reference to a hash/array and passing a hashref/arrayref scalar?

To be brief, I'm wondering if there are any best-practice reasons for deciding between:
my %hash = ( foo => 1, bar => 2 );
# some in-between logic
some_func(\%hash);
and
my $hashref = { foo => 1, bar => 2 };
# some in-between logic
some_func($hashref);
Or is purely a style decision?
The two are equivalent and interchangeable. The decision should be made based on what is clearest for you.
You can also move back and forth:
my $hashref = {x => 1, y => 2};
our %hash; *hash = $hashref;
some_func($hashref);
some_func(\%hash); # \%hash == $hashref
In general I prefer to work with the plural forms %name and #name since it results in less line noise due to dereferencing. That and some_func(\%var) is clearer with regard to var's type than some_func($var)
These two examples do exactly the same thing. So it mainly depends on which is more convenient for what else, if anything, you do with the %list variable and/or $listref variable.
Or maybe you'd like to skip the extra variable entirely:
some_func( { foo => 1, bar => 2 } );
(Less likely now that you've added those "in-between logic" comments.)
As indicated above, they're the same though the first may give you better context to what type of variable (i.e. a hash) you're using in your "some in-between logic" code.
In most cases it won't make a difference, but I have had some nasty bugs that occurred from trying to change where my reference points if I pass in a reference-of list.
Unless I really want to change the contents of a pre-existing array or hash, I personally favor using anonymous lists/hashes (like your second example).
This is just personal experience, though. I will admit that maybe if I understood Perl better I would know the objective best practices (if any exist).

What is the most elegant way in Perl to expand an iterator into a list?

I have an iterator with this interface: $hit->next_hsp
The current implementation to listify it is:
my #list;
while ( my $hsp = $hit->next_hsp ) {
push( #list, $hsp );
}
Now I'm thinking that there might be better ways to do this in less code. What do you say, stackers?
All iterators I've ever seen return undef to signify that they are exhausted. Therefore you should write while (defined(my $hsp = $hit->next_hsp)). The following example demonstrates the fault in the question which tests for truth (aborts at 1) instead of definedness (passes 'liftoff').
use 5.010;
my $hit = __PACKAGE__;
sub next_hsp {
state $i;
$i++;
return ['mumble', 4, 3, 2, 1, 0, 'liftoff']->[$i];
}
# insert snippet from question here
It entirely depends on the iterator implementation. If next_hsp is the only available method, then you're doing it right.
Don't worry about playing golf, the code you have looks just fine (other than the other answers about using defined). However, if you find yourself repeating this pattern 2 things come to mind.
The first is obvious, refactor it into a utility function, so that you have my #list = expand($hit).
The second question is a bit deeper - but to me smells more than playing golf. The whole point of iterators is to consume as you need them, so if you find yourself doing this often, are you sure it's really the right thing to do? Perhaps your moving this data outside your own API, so you're constrained to other's choices, but if you have the option of consuming an iterator rather than a list, maybe this will be a cleaner solution.

What's the best Perl practice for returning hashes from functions?

I am mulling over a best practice for passing hash references for return data to/from functions.
On the one hand, it seems intuitive to pass only input values to a function and have only return output variables. However, passing hashes in Perl can only be done by reference, so it is a bit messy and would seem more of an opportunity to make a mistake.
The other way is to pass a reference in the input variables, but then it has to be dealt with in the function, and it may not be clear what is an input and what is a return variable.
What is a best practice regarding this?
Return references to an array and a hash, and then dereference it.
($ref_array,$ref_hash) = $this->getData('input');
#array = #{$ref_array};
%hash = %{$ref_hash};
Pass in references (#array, %hash) to the function that will hold the output data.
$this->getData('input', \#array, \%hash);
Just return the reference. There is no need to dereference the whole
hash like you are doing in your examples:
my $result = some_function_that_returns_a_hashref;
say "Foo is ", $result->{foo};
say $_, " => ", $result->{$_} for keys %$result;
etc.
I have never seen anyone pass in empty references to hold the result. This is Perl, not C.
Trying to create copies by saying
my %hash = %{$ref_hash};
is even more dangerous than using the hashref. This is because it only creates a shallow copy. This will lead you to thinking it is okay to modify the hash, but if it contains references they will modify the original data structure. I find it better to just pass references and be careful, but if you really want to make sure you have a copy of the reference passed in you can say:
use Storable qw/dclone/;
my %hash = %{dclone $ref_hash};
The first one is better:
my ($ref_array,$ref_hash) = $this->getData('input');
The reasons are:
in the second case, getData() needs to
check the data structures to make
sure they are empty
you have freedom to return undef as a special value
it looks more Perl-idiomatic.
Note: the lines
#array = #{$ref_array};
%hash = %{$ref_hash};
are questionable, since you shallow-copy the whole data structures here. You can use references everywhere where you need array/hash, using -> operator for convenience.
If it's getting complicated enough that both the callsite and the called function are paying for it (because you have to think/write more every time you use it), why not just use an object?
my $results = $this->getData('input');
$results->key_value_thingies;
$results->listy_thingies;
If making an object is "too complicated" then start using Moose so that it no longer is.
My personal preference for sub interfaces:
If the routine has 0-3 arguments, they may be passed in list form: foo( 'a', 12, [1,2,3] );
Otherwise pass a list of name value pairs. foo( one => 'a', two => 12, three => [1,2,3] );
If the routine has or may have more than one argument seriously consider using name/value pairs.
Passing in references increases the risk of inadvertent data modification.
On returns I generally prefer to return a list of results rather than an array or hash reference.
I return hash or array refs when it will make a noticeable improvement in speed or memory consumption (ie BIG structures), or when a complex data structure is involved.
Returning references when not needed deprives one of the ability to take advantage of Perl's nice list handling features and exposes one to the dangers of inadvertent modification of data.
In particular, I find it useful to assign a list of results into an array and return the array, which provides the contextual return behaviors of an array to my subs.
For the case of passing in two hashes I would do something like:
my $foo = foo( hash1 => \%hash1, hash2 => \%hash2 ); # gets number of items returned
my #foo = foo( hash1 => \%hash1, hash2 => \%hash2 ); # gets items returned
sub foo {
my %arg = #_;
# do stuff
return #results;
}
I originally posted this to another question, and then someone pointed to this as a "related post", so I'll post it here to for my take on the subject, assuming people will encounter it in the future.
I'm going to contradict the Accepted Answer and say that I prefer to have my data returned as a plain hash (well, as an even-sized list which is likely to be interpreted as a hash). I work in an environment where we tend to do things like the following code snippet, and it's much easier to combine and sort and slice and dice when you don't have to dereference every other line. (It's also nice to know that someone can't damage your hashref because you passed the entire thing by value -- though someone pointed out that if your hash contains more than simple scalars it's not so simple.)
my %filtered_config_slice =
hashgrep { $a !~ /^apparent_/ && defined $b } (
map { $_->build_config_slice(%some_params, some_other => 'param') }
($self->partial_config_strategies, $other_config_strategy)
);
This approximates something that my code might do: building a configuration for an object based on various configuration strategy objects (some of which the object knows about inherently, plus some extra guy) and then filters out some of them as irrelevant.
(Yes, we have nice tools like hashgrep and hashmap and lkeys that do useful things to hashes. $a and $b get set to the key and the value of each item in the list, respectively). (Yes, we have people who can program at this level. Hiring is obnoxious, but we have a quality product.)
If you don't intend to do anything resembling functional programming like this, or if you need more performance (have you profiled?) then sure, use hashrefs.
Uh... "passing hashes can only be done by reference"?
sub foo(%) {
my %hash = #_;
do_stuff_with(%hash);
}
my %hash = (a => 1, b => 2);
foo(%hash);
What am I missing?
I would say that if the issue is that you need to have multiple outputs from a function, it's better as a general practice to output a data structure, probably a hash, that holds everything you need to send out rather than taking modifiable references as arguments.