sort uniq(#stuff) doesn't sort, doesn't de-dupe [duplicate]

sort uniq(#stuff) doesn't sort, doesn't de-dupe [duplicate] - perl

This question already has answers here:
why does sort with uniq not work together
(2 answers)
Closed 6 years ago.
Some code has me waxing lyrical,
The outcome is slightly hysterical.
sorted output I seek,
All elements unique,
But the results? Far from clinical.
Code in question
use strict;
use warnings;
sub uniq { my %seen; grep ! $seen{$_}++, #_ }
my #test = ();
for ( 1 .. 3 ) {
#test = sort uniq( #test, qw/ d d c c b b a a / );
print "#test\n";
}
Output
d d c c b b a a
d d c c b b a a d d c c b b a a
d d c c b b a a d d c c b b a a d d c c b b a a
The Fix
An extra set of parentheses restores parity:
#test = sort( uniq( #test, qw/ d d c c b b a a / ) ); # a b c d
Running the two lines through -MO=Deparse sheds some light on the effect of the extra parens - it forces the interpreter to treat the RHS as sort LIST instead of sort SUBNAME LIST:
# Doesn't work as intended (sort SUBNAME LIST)
#test = (sort uniq #test, ('d', 'd', 'c', 'c', 'b', 'b', 'a', 'a'));
# Works as intended (sort LIST)
#test = sort(uniq(#test, ('d', 'd', 'c', 'c', 'b', 'b', 'a', 'a')));
My Question
Why is the extra set of parentheses necessary?
uniq returns a list, so I'd expect
sort uniq( #stuff );
to be equivalent to
sort LIST

Although it's rarely used, the first form listed in perldoc -f sort is sort SUBNAME LIST. i.e. the optional second argument to sort is the name of a function to use as the sort comparator. The LIST, of course, may or may not have parentheses as it wants, and whitespace is free, so
sort uniq( #test, qw/ d d c c b b a a / )
means to sort the list (#test, qw/ d d c c b b a a /) with the function uniq as a comparator. Since the result of uniq is independent of $a and $b and it has no prototype, it always returns undef, which sort considers as 0, and sort responds to this assertion that everything is equal by not changing the order of anything (since it's a stable sort, since 5.8 at least).

uniq treated as a sub name because it's an identifier or a qualified identifier that's not also a function name. No actual check is made to see if the sub actually exists (although it would have found the sub to exist in this case).
sort needs to be followed by a function name or something that's not an identifier or qualified identifier to be disqualified from the sort SUBNAME LIST syntax.

Related

How to print specific key in an array (Perl) [duplicate]

This question already has answers here:
Simple hash search by value
(5 answers)
Closed 5 years ago.
I recently started learning Perl, so I'm not too familiar with the functions and syntax.
If I have a Perl array and some variables,
#!/usr/bin/perl
use strict;
use warnings;
my #numbers = (a =>1, b=> 2, c => 3, d =>4, e => 5);
my $x;
my $range = 5;
$x = int(rand($range));
print "$x";
to generate a random number between 1-5, how can I get the program to print the actual key (a, b, c, etc.) instead of just the number (1, 2, 3, 4, 5)?

It seems that you want to do a reverse lookup, key-by-value, opposite to what we get from a hash. Since a hash is a list you can reverse it and use the resulting hash to look up by number.
A couple of corrections: you need a hash variable (not an array), and you need to add 1 to your rand integer generator so to have the desired 1..5 range
use warnings;
use strict;
use feature 'say';
my %numbers = (a => 1, b => 2, c => 3, d => 4, e => 5);
my %lookup_by_number = reverse %numbers; # values need be unique
my $range = 5;
my $x = int(rand $range) + 1;
say $lookup_by_number{$x};
Without reversing the hash you'd need to iterate the hash %numbers over values, testing each against $x so to find its key.
If there are same values for various keys in your original hash then you have to do it by hand since reverse-ing would attempt to create a hash with duplicate keys, in which case only the last one assigned remains. So you'd lose some values. One way
my #at_num = grep { $x == $numbers{$_} } keys %numbers;
as in the post that this was marked as duplicate of.
But then you should build a data structure for reverse lookup so to not search through the list every time information is needed. This can be a hash where keys are the list of unique numbers while their values are then array references (arrayrefs) with corresponding keys from the original hash
use warnings;
use strict;
my %num = (a => 1, b => 2, c => 1, d => 3, e => 2); # with duplicate values
my %lookup_by_num;
foreach my $key (keys %num) {
push #{ $lookup_by_num{$num{$key}} }, $key;
}
say "$_ => [ #{$lookup_by_num{$_}} ]" for keys %lookup_by_num;
This prints
1 => [ c a ]
3 => [ d ]
2 => [ e b ]
A nice way to display complex data structures is via Data::Dumper, or Data::Dump (or others).
The expression #{ $lookup_by_num{ $num{$key} } } extracts the value of %lookup_by_num for the key $num{$key}and dereferences it #{ ... }, so that it can then push the $key to it. The critical part of this is that the first time it encounters $num{$key} it autovivifies the arrayref and its corresponding key. See this post with its references for details.

There's many ways to do it. For example, declare "numbers" as a hash rather than an array. Note that the keys come first in each key-value pair, and here you want to use your random int as the key:
my %numbers = ( 0 => 'a', 1 => 'b', 2 => 'c', 3 => 'd', 4 => 'e' );
Then you can look up the "key" as you call it using:
my $key = $numbers{$x};
Note that rand( $x ); returns a number greater than or equal to zero and less than $x. So if you want integers in the range 1-5, you must add 1 in your code: at the moment you'll get 0-4, not 1-5.

Firstly, arrays don't have keys (well, they kind of do, but they're integers and not the values you want). So I think you want a hash, not an array.
my %numbers = (a =>1, b=> 2, c => 3, d =>4, e => 5);
And if you want to get the letter, given the integer then you need the reverse of this hash:
my %rev_numbers = %numbers;
Note that reversing a hash like this only works if the values in your original hash are unique (because reversing a hash makes the values into keys and hash keys are always unique).
Then, you can just look up an integer in your %rev_hash to get its associated letter.
my $integer = 3;
say $rev_numbers{$integer}; # prints 'c'

Splitting an Array into n accessible parts within perl?

My goal is to take an array of letters and cut it up into "n" parts. In this case no more than 10 letters each piece. But I want these arrays to be stored into an array reference which I can access on a counter.
For example, I have the following script to split an array of English alphabetical letters into 1 array of 10 letters. But since the English Alphabet has 26 letters, I need 2 more arrays to access in an array reference.
#!/usr/bin/env perl
#split an array into parts.
use strict;
use warnings;
use feature 'say';
my #letters = ('A' .. 'Z');
say "These are my letters:";
for(#letters){print "$_ ";}
my #letters_selected = splice(#letters, 0, 10);
say "\nThese are my selected letters:";
for(#letters_selected){print "$_ ";}
The output is this:
These are my letters:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
These are my selected letters:
A B C D E F G H I J
This little script only gives me one piece of 10 letters of the alphabet. But I want all three pieces of 10 letters of the alphabet, so I would like to know how I can achieve this:
Goal:
Have an array reference called letters_selected of letters which contains all letters A - Z. But ... I can access all three pieces of size less than or equal to 10 letters like this.
foreach(#{$letters_selected[0]}){say "$_ ";}
returns: A B C D E F G H I J # These are the initial 10 elements of the alphabet.
foreach(#{$letters_selected[1]}){say "$_ ";}
returns: K L M N O P Q R S T # The next 10 after that.
foreach(#{$letters_selected[2]}){say "$_ ";}
returns: U V W X Y Z # The next no more than 10 after that.

Since splice is destructive to its target you can keep applying it
use warnings;
use strict;
use feature 'say';
my #letters = 'A'..'Z';
my #letter_groups;
push #letter_groups, [ splice #letters, 0, 10 ] while #letters;
say "#$_" for #letter_groups;
After this #letters is empty. So make a copy of it and work with that if you will need it.
Every time through, splice removes and returns elements from #letters and [ ] makes an anonymous array of that list. This reference is pushed on #letter_groups.
Since splice takes as many elements as there are (if there aren't 10) once fewer than 10 remain splice removes and returns that, the #letters gets emptied, and while terminates.

perl6 What is the best way to match any of a group of words?

I am trying to find a simple way to match any of a group of words. I have been using a for loop, but is there a simpler way?
my #a=<a b c d e f>;
my $x="a1234567";
say $x ~~ m/ #a.any /;
It returns False. Is there a way to make it work? Thanks.

my #a = <a b c d e f>;
my $x = "a1234567";
say $x ~~ /#a/;
/#a/ is the same as /| #a/ which is longest alternation. For alternation you can use /|| #a/.

Perl script to check another array values depending on current array index

I'm working on a perl assignment, that has three arrays - #array_A, #array_B and array_C with some values in it, I grep for a string "CAT" on array A and fetching its indices too
my #index = grep { $#array_A[$_] =~ 'CAT' } 0..$#array_A;
print "Index : #index\n";
Output: Index : 2 5
I have to take this as an input and check the value of other two arrays at indices 2 and 5 and print it to a file.
Trick is the position of the string - "CAT" varies. (Index might be 5 , 7 and 9)
I'm not quite getting the logic here , looking for some help with the logic.

Here's an overly verbose example of how to extract the values you want as to show what's happening, while hopefully leaving some room for you to have to further investigate. Note that it's idiomatic Perl to use regex delimiters when using =~. eg: $name =~ /steve/.
use warnings;
use strict;
my #a1 = qw(AT SAT CAT BAT MAT CAT SLAT);
my #a2 = qw(a b c d e f g);
my #a3 = qw(1 2 3 4 5 6 7);
# note the difference in the next line... no # symbol...
my #indexes = grep { $a1[$_] =~ /CAT/ } 0..$#a1;
for my $index (#indexes){
my $a2_value = $a2[$index];
my $a3_value = $a3[$index];
print "a1 index: $index\n" .
"a2 value: $a2_value\n" .
"a3 value: $a3_value\n" .
"\n";
}
Output:
a1 index: 2
a2 value: c
a3 value: 3
a1 index: 5
a2 value: f
a3 value: 6

Why do I get an error when I try to use the reptition assignment operator with an array?

#!/usr/bin/perl
use strict;
use warnings;
my #a = qw/a b c/;
(#a) x= 3;
print join(", ", #a), "\n";
I would expect the code above to print "a, b, c, a, b, c, a, b, c\n", but instead it dies with the message:
Can't modify private array in repeat (x) at z.pl line 7, near "3;"
This seems odd because the X <op>= Y are documented as being equivalent to X = X <op> Y, and the following code works as I expect it to:
#!/usr/bin/perl
use strict;
use warnings;
my #a = qw/a b c/;
(#a) = (#a) x 3;
print join(", ", #a), "\n";
Is this a bug in Perl or am I misunderstanding what should happen here?

My first thought was that it was a misunderstanding of some subtlety on Perl's part, namely that the parens around #a made it parse as an attempt to assign to a list. (The list itself, not normal list assignment.) That conclusion seems to be supported by perldiag:
Can't modify %s in %s
(F) You aren't allowed to assign to the item indicated, or otherwise try to
change it, such as with an auto-increment.
Apparently that's not the case, though. If it were this should have the same error:
($x) x= 3; # ok
More conclusively, this gives the same error:
#a x= 3; # Can't modify private array in repeat...
Ergo, definitely a bug. File it.

My guess is that Perl is not a language with full symbolic transformations. It tries to figure out what you mean. If you "list-ify" #a, by putting it in parens, it sort of loses what you wanted to assign it to.
Notice that this does not do what we want:
my #b = #a x 3; # we'll get scalar( #a ) --> '3' x 3 --> '333'
But, this does:
my #b = ( #a ) x 3;
As does:
( #a ) = ( #a ) x 3;
So it seems that when the expression actally appears on both sides Perl interprets them in different contexts. It knows that we're assigning something, so it tries to find out what we're assigning to.
I'd chalk it up to a bug, from a very seldom used syntax.

The problem is that you're trying to modify #a in place, which Perl evidently doesn't allow you to do. Your second example is doing something subtly different, which is to create a new array that consists of #a repeated three times, then overwriting #a with that value.
Arguably the first form should be transparently translated to the second form, but that isn't what actually happens. You could consider this a bug... file it in the appropriate places and see what happens.