Perl: What is the easiest way to flatten a multidimensional array? - perl

What's the easiest way to flatten a multidimensional array ?

One level of flattening using map
$ref = [[1,2,3,4],[5,6,7,8]]; # AoA
#a = map {#$_} #$ref; # flattens it
print "#a"; # 1 2 3 4 5 6 7 8

Using List::Flatten seems like the easiest:
use List::Flatten;
my #foo = (1, 2, [3, 4, 5], 6, [7, 8], 9);
my #bar = flat #foo; # #bar contains 9 elements, same as (1 .. 9)
Actually, that module exports a single simple function flat, so you might as well copy the source code:
sub flat(#) {
return map { ref eq 'ARRAY' ? #$_ : $_ } #_;
}
You could also make it recursive to support more than one level of flattening:
sub flat { # no prototype for this one to avoid warnings
return map { ref eq 'ARRAY' ? flat(#$_) : $_ } #_;
}

The easiest and most natural way, is to iterate over the values and use the # operator to "dereference" / "unpack" any existing nested values to get the constituent parts. Then repeat the process for every reference value encountered.
This is similar to Viajayenders solution, but works for values not already in an array reference and for any level of nesting:
sub flatten {
map { ref $_ ? flatten(#{$_}) : $_ } #_;
}
Try testing it like so:
my #l1 = [ 1, [ 2, 3 ], [[[4]]], 5, [6], [[7]], [[8,9]] ];
my #l2 = [ [1,2,3,4,5], [6,7,8,9] ];
my #l3 = (1, 2, [3, 4, 5], 6, [7, 8], 9); # Example from List::Flatten
my #r1 = flatten(#l1);
my #r2 = flatten(#l1);
my #r3 = flatten(#l3);
if (#r1 ~~ #r2 && #r2 ~~ #r3) { say "All list values equal"; }

if data is always like an example, I recommend List::Flatten too.
but data has more than 2 nested array, flat cant't work.
like #foo = [1, [2, [3, 4, 5]]]
in that case, you should write recursive code for it.
how about bellow.
sub flatten {
my $arg = #_ > 1 ? [#_] : shift;
my #output = map {ref $_ eq 'ARRAY' ? flatten($_) : $_} #$arg;
return #output;
}
my #foo = (1, 2, [3, 4, 5, [6, 7, 8]], 9);
my $foo = [1, 2, [3, 4, 5, [6, 7, 8]], 9];
my #output = flatten #foo;
my #output2 = flatten $foo;
print "#output";
print "#output2";

The easiest way to flatten a multidimensional array when it includes:
1. arrays
2. array references
3. scalar values
4. scalar references
sub flatten {
map { ref $_ eq 'ARRAY' ? flatten(#{$_}) :
ref $_ eq 'SCALAR' ? flatten(${$_}) : $_
} #_;
}
The other flatten sub answer crashes on scalar references.

Something along the lines of:
my $i = 0;
while ($i < scalar(#array)) {
if (ref #array[$i] eq 'ARRAY') {
splice #array, $i, 1, #$array[$i];
} else {
$i++;
}
}
I wrote it blindly, no idea if it actually works but you should get the idea.

Same as Vijayender's solution but will work on mixed arrays containing arrayrefs and scalars.
$ref = [[1,2,3,4],[5,6,7,8],9,10];
#a = map { ref $_ eq "ARRAY" ? #$_ : $_ } #$ref;
print "#a"
Of course you can extend it to also dereference hashrefs:
#a = map { ref $_ eq "ARRAY" ? #$_ : ref $_ eq "HASH" ? %$_: $_ } $#ref;
or use grep to weed out garbage:
#a = map { #$_} grep { ref $_ eq 'ARRAY' } #$ref;
As of List::MoreUtils 0.426 we have an arrayify function that flattens arrays recursively:
#a = (1, [[2], 3], 4, [5], 6, [7], 8, 9);
#l = arrayify #a; # returns 1, 2, 3, 4, 5, 6, 7, 8, 9
It was introduced earlier but was broken.

Related

Perl, How to sort hash (of arrays) keys according to specific positions in arrays

I have hash of array references. I want to sort hash keys according to those arrays' last element and if they are equal, then i want to sort them according to previous element and so on.
i have written a simple custom sort subroutine which sorts according to last element
our %hash = (); #
sub customsort($$)
{ ${$hash{$_[0]}}[-1] <=> ${$hash{$_[1]}}[-1] }
I know i need to pass another argument $j instead of predefined -1 for fixed last element. Then i will set up a loop inside subroutine with some checks, etc. However i couldn't figure out how to pass it while using the subroutine in actual part of code
foreach my $key (sort customsort keys (%hash) ) {..}
Thanks in advance
Here's one way to do it:
#!/usr/bin/env perl
use strict;
use warnings;
use 5.010;
my %hash = (
foo => [ 1, 2, 3, 4, 5 ],
bar => [ 8, 6, 7, 5, 3, 0, 9 ],
baz => [ 5, 5, 5, 5, 5 ],
baz2 => [ 5, 5, 5, 5, 5 ],
);
sub customsort {
my $res;
my $index = -1;
while (1) {
return 0 if ($#{$hash{$a}} + $index < 0) || ($#{$hash{$b}} + $index < 0);
$res = ${$hash{$a}}[$index] <=> ${$hash{$b}}[$index];
return $res if $res;
$index--;
}
}
my #sorted = sort customsort keys %hash;
say $_ for #sorted;
I'm using $a and $b instead of the ($$) prototype because Perl prototypes are generally best avoided, but also note that, according to perldoc sort, using the prototype is slower. So just embrace the magic of $a and $b.
The return 0 if... line is to prevent warnings if you have arrays of different lengths (bar) that have to look back beyond the beginning of a shorter array, and to prevent infinite loops if you have identical arrays (baz and baz2).

perl: function that gives nonzero integers between 1 and $input

Im looking for (how to make?) a function (module?) that for
my $scalar = 16;
return function ($scalar);
gives
#return = ( 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 );
That is, gives the non-zero integers between 1 and $scalar.
It's ok to assume $scalar is a big number, but im not particularly searching for a super optimal solution.
The Range Operator .. returns a list which matches exactly what you want.
To create a list from 1 to $x, the syntax is just 1 .. $x
To assign that to an array variable, #array = 1 .. $x;
[1..16]
creates an array reference
1..16 creates a list.
Try this source
use Data::Dumper;
$c = [1..16];
#d = 1..16;
print Dumper $c;
print Dumper \#d;
sub getvalue {
my #array1 = 1..$_[0];
return (#array1);
}
else
sub getvalue {
return ( 1..$_[0]);
}
refer this too know more about range operator
http://www.perlmonks.org/?node_id=377450
What happens when you want all integers between 1 and 1_000_000_000? You wouldn't want to create an array that big even if your computer has enough memory.
#!/usr/bin/env perl
use strict;
use warnings;
sub make_lazy_increasing_sequence {
my ($current, $end) = map int, #_;
return sub {
return if $current > $end;
return $current++;
}
}
my $from_1_to_5 = make_lazy_increasing_sequence(1, 5);
my #others;
while (defined(my $i = $from_1_to_5->())) {
push #others, make_lazy_increasing_sequence($i, 10_000);
}
for (1 .. 10) { # perl makes sure this range is lazy
print join(',', map $_->(), #others), "\n";
}
print $others[-1]->(), "\n";
Output:
1,2,3,4,5
2,3,4,5,6
3,4,5,6,7
4,5,6,7,8
5,6,7,8,9
6,7,8,9,10
7,8,9,10,11
8,9,10,11,12
9,10,11,12,13
10,11,12,13,14
15

How can I dereference an array of arrays in Perl?

How do I dereference an array of arrays when passed to a function?
I am doing it like this:
my #a = {\#array1, \#array2, \#array3};
func(\#a);
func{
#b = #_;
#c = #{#b};
}
Actually I want the array #c should contain the addresses of #array1, #array2, and #array3.
my #a = {\#array1, \#array2, \#array3};
The above is an array with a single member -> a hash containing:
{ ''.\#array1 => \#array2, ''.\#array3 => undef }
Because as a key in the hash, Perl coerces the reference to #array1 into a string. And Perl allows a scalar hash reference to be assigned to an array, because it is "understood" that you want an array with the first element being the scalar you assigned to it.
You create an array of arrays, like so:
my #a = (\#array1, \#array2, \#array3);
And then in your function you would unpack them, like so:
sub func {
my $ref = shift;
foreach my $arr ( #$ref ) {
my #list_of_values = #$arr;
}
}
Or some variation thereof, like say a map would be the easiest expression:
my #list_of_entries = map { #$_ } #$ref;
In your example, #c as a list of addresses is simply the same thing as a properly constructed #a.
You may want to read perldoc perlreftut, perldoc perlref, and perldoc perldsc You can say:
sub func {
my $arrayref = shift;
for my $aref (#$arrayref) {
print join(", ", #$aref), "\n";
}
}
my #array1 = (1, 2, 3);
my #array2 = (4, 5, 6);
my #array3 = (7, 8, 9);
my #a = \(#array1, #array2, #array3);
func \#a;
or more compactly:
sub func {
my $arrayref = shift;
for my $aref (#$arrayref) {
print join(", ", #$aref), "\n";
}
}
func [ [1, 2, 3], [4, 5, 6], [7, 8, 9] ];
Read the perlreftut documentation.
Edit: Others point out a good point I missed at first. In the initialization of #a, you probably meant either #a = (...) (create array containing references) or $arrayref = [...] (create reference to array), not {...} (create reference to hash). The rest of this post pretends you had the #a = (...) version.
Since you pass one argument (a reference to #a) to func, #_ is a list containing that one reference. You can get that reference and then dereference it by doing:
sub func {
my $arrayref = shift;
my #c = #{$arrayref};
}
Or in one line, it would look like:
sub func {
my #c = #{shift()};
}
(If you hadn't used the backslash in func(\#a), #_ would be equal to #a, the array of three references.)
The following function is designed to take either an array or an array reference and give back a sorted array of unique values. Undefined values are removed and HASH and GLOB are left as is.
#!/usr/bin/perl
use strict; use warnings;
my #one = qw / dog rat / ;
my #two = qw / dog mice / ;
my #tre = ( "And then they said it!", "No!?? ", );
open my $H, '<', $0 or die "unable to open $0 to read";
my $dog; # to show behavior with undefined value
my %hash; $hash{pig}{mouse}=55; # to show that it leaves HASH alone
my $rgx = '(?is)dog'; $rgx = qr/$rgx/; # included for kicks
my #whoo = (
'hey!',
$dog, # undefined
$rgx,
1, 2, 99, 999, 55.5, 3.1415926535,
%hash,
$H,
[ 1, 2,
[ 99, 55, \#tre, ],
3, ],
\#one, \#two,
[ 'fee', 'fie,' ,
[ 'dog', 'dog', 'mice', 'gopher', 'piranha', ],
[ 'dog', 'dog', 'mice', 'gopher', 'piranha', ],
],
[ 1, [ 1, 2222, ['no!', 'no...', 55, ], ], ],
[ [ [ 'Rat!', [ 'Non,', 'Tu es un rat!' , ], ], ], ],
'Hey!!',
0.0_1_0_1,
-33,
);
print join ( "\n",
recursively_dereference_sort_unique_array( [ 55, 9.000005555, ], #whoo, \#one, \#whoo, [ $H ], ),
"\n", );
close $H;
exit;
sub recursively_dereference_sort_unique_array
{
# recursively dereference array of arrays; return unique values sorted. Leave HASH and GLOB (filehandles) as they are.
# 2020v10v04vSunv12h20m15s
my $sb_name = (caller(0))[3];
#_ = grep defined, #_; #https://stackoverflow.com/questions/11122977/how-do-i-remove-all-undefs-from-array
my #redy = grep { !/^ARRAY\x28\w+\x29$/ } #_; # redy==the subset that is "ready"
my #noty = grep { /^ARRAY\x28\w+\x29$/ } #_; # noty==the subset that is "not yet"
my $countiter = 0;
while (1)
{
$countiter++;
die "$sb_name: are you in an infinite loop?" if ($countiter > 99);
my #next;
foreach my $refarray ( #noty )
{
my #tmparray = #$refarray;
push #next, #tmparray;
}
#next = grep defined, #next;
my #okay= grep { !/^ARRAY\x28\w+\x29$/ } #next;
#noty = grep { /^ARRAY\x28\w+\x29$/ } #next;
push #redy, #okay;
my %hash = map { $_ => 1 } #redy; # trick to get unique values
#redy = sort keys %hash;
return #redy unless (scalar #noty);
}
}
Should be
func {
$b = shift;
}
if you're passing in a reference. Hope that helps some.

How can I partition a Perl array into equal sized chunks?

I have a fixed-sized array where the size of the array is always in factor of 3.
my #array = ('foo', 'bar', 'qux', 'foo1', 'bar', 'qux2', 3, 4, 5);
How can I cluster the member of array such that we can get
an array of array group by 3:
$VAR = [ ['foo','bar','qux'],
['foo1','bar','qux2'],
[3, 4, 5] ];
my #VAR;
push #VAR, [ splice #array, 0, 3 ] while #array;
or you could use natatime from List::MoreUtils
use List::MoreUtils qw(natatime);
my #VAR;
{
my $iter = natatime 3, #array;
while( my #tmp = $iter->() ){
push #VAR, \#tmp;
}
}
I really like List::MoreUtils and use it frequently. However, I have never liked the natatime function. It doesn't produce output that can be used with a for loop or map or grep.
I like to chain map/grep/apply operations in my code. Once you understand how these functions work, they can be very expressive and very powerful.
But it is easy to make a function to work like natatime that returns a list of array refs.
sub group_by ($#) {
my $n = shift;
my #array = #_;
croak "group_by count argument must be a non-zero positive integer"
unless $n > 0 and int($n) == $n;
my #groups;
push #groups, [ splice #array, 0, $n ] while #array;
return #groups;
}
Now you can do things like this:
my #grouped = map [ reverse #$_ ],
group_by 3, #array;
** Update re Chris Lutz's suggestions **
Chris, I can see merit in your suggested addition of a code ref to the interface. That way a map-like behavior is built in.
# equivalent to my map/group_by above
group_by { [ reverse #_ ] } 3, #array;
This is nice and concise. But to keep the nice {} code ref semantics, we have put the count argument 3 in a hard to see spot.
I think I like things better as I wrote it originally.
A chained map isn't that much more verbose than what we get with the extended API.
With the original approach a grep or other similar function can be used without having to reimplement it.
For example, if the code ref is added to the API, then you have to do:
my #result = group_by { $_[0] =~ /foo/ ? [#_] : () } 3, #array;
to get the equivalent of:
my #result = grep $_->[0] =~ /foo/,
group_by 3, #array;
Since I suggested this for the sake of easy chaining, I like the original better.
Of course, it would be easy to allow either form:
sub _copy_to_ref { [ #_ ] }
sub group_by ($#) {
my $code = \&_copy_to_ref;
my $n = shift;
if( reftype $n eq 'CODE' ) {
$code = $n;
$n = shift;
}
my #array = #_;
croak "group_by count argument must be a non-zero positive integer"
unless $n > 0 and int($n) == $n;
my #groups;
push #groups, $code->(splice #array, 0, $n) while #array;
return #groups;
}
Now either form should work (untested). I'm not sure whether I like the original API, or this one with the built in map capabilities better.
Thoughts anyone?
** Updated again **
Chris is correct to point out that the optional code ref version would force users to do:
group_by sub { foo }, 3, #array;
Which is not so nice, and violates expectations. Since there is no way to have a flexible prototype (that I know of), that puts the kibosh on the extended API, and I'd stick with the original.
On a side note, I started with an anonymous sub in the alternate API, but I changed it to a named sub because I was subtly bothered by how the code looked. No real good reason, just an intuitive reaction. I don't know if it matters either way.
Or this:
my $VAR;
while( my #list = splice( #array, 0, 3 ) ) {
push #$VAR, \#list;
}
Another answer (a variation on Tore's, using splice but avoiding the while loop in favor of more Perl-y map)
my $result = [ map { [splice(#array, 0, 3)] } (1 .. (scalar(#array) + 2) % 3) ];
Try this:
$VAR = [map $_ % 3 == 0 ? ([ $array[$_], $array[$_ + 1], $array[$_ + 2] ])
: (),
0..$#array];
Another generic solution, non-destructive to the original array:
use Data::Dumper;
sub partition {
my ($arr, $N) = #_;
my #res;
my $i = 0;
while ($i + $N-1 <= $#$arr) {
push #res, [#$arr[$i .. $i+$N-1]];
$i += $N;
}
if ($i <= $#$arr) {
push #res, [#$arr[$i .. $#$arr]];
}
return \#res;
}
print Dumper partition(
['foo', 'bar', 'qux', 'foo1', 'bar', 'qux2', 3, 4, 5],
3
);
The output:
$VAR1 = [
[
'foo',
'bar',
'qux'
],
[
'foo1',
'bar',
'qux2'
],
[
3,
4,
5
]
];
As a learning experience I decided to do this in Perl6
The first, perhaps most simplest way I tried was to use map.
my #output := #array.map: -> $a, $b?, $c? { [ $a, $b // Nil, $c // Nil ] };
.say for #output;
foo bar qux
foo1 bar qux2
3 4 5
That didn't seem very scalable. What if I wanted to take the items from the list 10 at a time, that would get very annoying to write. ... Hmmm I did just mention "take" and there is a keyword named take lets try that in a subroutine to make it more generally useful.
sub at-a-time ( Iterable \sequence, Int $n where $_ > 0 = 1 ){
my $is-lazy = sequence.is-lazy;
my \iterator = sequence.iterator;
# gather is used with take
gather loop {
my Mu #current;
my \result = iterator.push-exactly(#current,$n);
# put it into the sequence, and yield
take #current.List;
last if result =:= IterationEnd;
}.lazy-if($is-lazy)
}
For kicks let's try it against an infinite list of the fibonacci sequence
my $fib = (1, 1, *+* ... *);
my #output = at-a-time( $fib, 3 );
.say for #output[^5]; # just print out the first 5
(1 1 2)
(3 5 8)
(13 21 34)
(55 89 144)
(233 377 610)
Notice that I used $fib instead of #fib. It was to prevent Perl6 from caching the elements of the Fibonacci sequence.
It might be a good idea to put it into a subroutine to create a new sequence everytime you need one, so that the values can get garbage collected when you are done with them.
I also used .is-lazy and .lazy-if to mark the output sequence lazy if the input sequence is. Since it was going into an array #output it would have tried to generate all of the elements from an infinite list before continuing onto the next line.
Wait a minute, I just remembered .rotor.
my #output = $fib.rotor(3);
.say for #output[^5]; # just print out the first 5
(1 1 2)
(3 5 8)
(13 21 34)
(55 89 144)
(233 377 610)
.rotor is actually far more powerful than I've demonstrated.
If you want it to return a partial match at the end you will need to add a :partial to the arguments of .rotor.
Use the spart function from the List::NSect package on CPAN.
perl -e '
use List::NSect qw{spart};
use Data::Dumper qw{Dumper};
my #array = ("foo", "bar", "qux", "foo1", "bar", "qux2", 3, 4, 5);
my $var = spart(3, #array);
print Dumper $var;
'
$VAR1 = [
[
'foo',
'bar',
'qux'
],
[
'foo1',
'bar',
'qux2'
],
[
3,
4,
5
]
];
Below a more generic solution to the problem:
my #array = ('foo', 'bar', 1, 2);
my $n = 3;
my #VAR = map { [] } 1..$n;
my #idx = sort map { $_ % $n } 0..$#array;
for my $i ( 0..$#array ){
push #VAR[ $idx[ $i ] ], #array[ $i ];
}
This also works when the number of items in the array is not a factor of 3.
In the above example, the other solutions with e.g. splice would produce two arrays of length 2 and one of length 0.

Is there an elegant zip to interleave two lists in Perl 5?

I recently "needed" a zip function in Perl 5 (while I was thinking about How do I calculate relative time?), i.e. a function that takes two lists and "zips" them together to one list, interleaving the elements.
(Pseudo)example:
#a=(1, 2, 3);
#b=('apple', 'orange', 'grape');
zip #a, #b; # (1, 'apple', 2, 'orange', 3, 'grape');
Haskell has zip in the Prelude and Perl 6 has a zip operator built in, but how do you do it in an elegant way in Perl 5?
Assuming you have exactly two lists and they are exactly the same length, here is a solution originally by merlyn (Randal Schwartz), who called it perversely perlish:
sub zip2 {
my $p = #_ / 2;
return #_[ map { $_, $_ + $p } 0 .. $p - 1 ];
}
What happens here is that for a 10-element list, first, we find the pivot point in the middle, in this case 5, and save it in $p. Then we make a list of indices up to that point, in this case 0 1 2 3 4. Next we use map to pair each index with another index that’s at the same distance from the pivot point as the first index is from the start, giving us (in this case) 0 5 1 6 2 7 3 8 4 9. Then we take a slice from #_ using that as the list of indices. This means that if 'a', 'b', 'c', 1, 2, 3 is passed to zip2, it will return that list rearranged into 'a', 1, 'b', 2, 'c', 3.
This can be written in a single expression along ysth’s lines like so:
sub zip2 { #_[map { $_, $_ + #_/2 } 0..(#_/2 - 1)] }
Whether you’d want to use either variation depends on whether you can see yourself remembering how they work, but for me, it was a mind expander.
The List::MoreUtils module has a zip/mesh function that should do the trick:
use List::MoreUtils qw(zip);
my #numbers = (1, 2, 3);
my #fruit = ('apple', 'orange', 'grape');
my #zipped = zip #numbers, #fruit;
Here is the source of the mesh function:
sub mesh (\#\#;\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#) {
my $max = -1;
$max < $#$_ && ($max = $#$_) for #_;
map { my $ix = $_; map $_->[$ix], #_; } 0..$max;
}
I find the following solution straightforward and easy to read:
#a = (1, 2, 3);
#b = ('apple', 'orange', 'grape');
#zipped = map {($a[$_], $b[$_])} (0 .. $#a);
I believe it's also faster than solutions that create the array in a wrong order first and then use slice to reorder, or solutions that modify #a and #b.
For arrays of the same length:
my #zipped = ( #a, #b )[ map { $_, $_ + #a } ( 0 .. $#a ) ];
my #l1 = qw/1 2 3/;
my #l2 = qw/7 8 9/;
my #out;
push #out, shift #l1, shift #l2 while ( #l1 || #l2 );
If the lists are a different length, this will put 'undef' in the extra slots but you can easily remedy this if you don't wish to do this. Something like ( #l1[0] && shift #l1 ) would do it.
Hope this helps!
Algorithm::Loops is really nice if you do much of this kind of thing.
My own code:
sub zip { #_[map $_&1 ? $_>>1 : ($_>>1)+($#_>>1), 1..#_] }
This is totally not an elegant solution, nor is it the best solution by any stretch of the imagination. But it's fun!
package zip;
sub TIEARRAY {
my ($class, #self) = #_;
bless \#self, $class;
}
sub FETCH {
my ($self, $index) = #_;
$self->[$index % #$self][$index / #$self];
}
sub STORE {
my ($self, $index, $value) = #_;
$self->[$index % #$self][$index / #$self] = $value;
}
sub FETCHSIZE {
my ($self) = #_;
my $size = 0;
#$_ > $size and $size = #$_ for #$self;
$size * #$self;
}
sub CLEAR {
my ($self) = #_;
#$_ = () for #$self;
}
package main;
my #a = qw(a b c d e f g);
my #b = 1 .. 7;
tie my #c, zip => \#a, \#b;
print "#c\n"; # ==> a 1 b 2 c 3 d 4 e 5 f 6 g 7
How to handle STORESIZE/PUSH/POP/SHIFT/UNSHIFT/SPLICE is an exercise left to the reader.