How to grep only defined values from perl array [closed] - perl

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I have a lot of arrays that have some defined array elements, and then some undefined elements.
This problem does not show in simple cases like this:
my #x = sort {$a <=> $b} grep {defined} #{ $args{data} };
but this is allowing undefined values to get through in complex cases like this:
sub boxplot_series {
my %args = (
output_type => 'eps', #_ # defaults
);
my #labels = sort keys %{ $args{data} };# $args{data} should be a hash of arrays
my ($data_fh, $data_filename) = tempfile(UNLINK => 1);
my $x = 1;
foreach my $set (#labels) {
my $n = scalar #{ $args{data}{$set} };
if ($n == 0) {
confess "$set has no values.\n";
}
my #x = sort {$a <=> $b} grep {defined $_} #{ $args{data}{$set} };
if (grep undef #x) {
confess "\#x has undef.";
}
if (scalar #x == 0) { next }
my $n4 = floor(($n+3)/2)/2;
my #d = (1, $n4, ($n +1)/2, $n+1-$n4, $n);#d <- c(1, n4, (n + 1)/2, n + 1 - n4, n)
my #sum_array = $x;
$x++;
foreach my $e (0..4) {
my $floor = floor($d[$e]-1);
my $ceil = ceil($d[$e]-1);
# undef values can get through here too
push #sum_array, (0.5 * ($x[$floor] + $x[$ceil]));
}
....
How can I grep only defined elements from Perl arrays?

It works correctly.
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my %args = (data => [4, 2, undef, 3, undef, 1]);
my #x = sort {$a <=> $b} grep {defined} #{ $args{data} };
say for #x;
But it breaks if you specify
my $a;
before the sort in the same scope. That's because $a (and $b) are special global variables and you shouldn't redeclare them. Now you know why.
With warnings on, you also get
"my $a" used in sort comparison at ...

Related

Hash key value changing to array reference after subroutine in perl

I'm creating keys outside and inside a subroutine on the same hash. However, after the subroutine, the values in the keys I created before the subroutine is called, are now interpreted as array references.
#!/usr/bin/perl
use module;
use strict;
use warnings;
my %hash;
my $count = 0;
my #array = ("a", "b", "c", "d");
for my $letter (#array) {
$hash{$letter} = $count;
$count++;
}
# need "\" to pass in hash otherwise changes
# will get lost outside of subroutine
foreach my $x (sort keys %hash) {
print "first $hash{$x}\n";
}
module::add_ten(\%hash);
foreach my $p (sort keys %hash) {
# $hash{$p} is printing array references, but before it was
# printing the value I desired. What did the subroutine do?
print "second $hash{$p} $hash{$p}->{ten}\n";
}
and here is the module with the subroutine
package module;
sub add_ten {
my $count = 10;
# this passes the full array as reference
my ($hash_ref) = #_; # $hash_ref is actually %hash (yes, the % is not a typo)
my #keys = keys $hash_ref;
foreach my $ltr (sort keys $hash_ref) {
$hash_ref->{$ltr} = { ten => $count };
$count++;
}
}
1;
here is the output:
first 0
first 1
first 2
first 3
second HASH(0x7ff0c3049c50) 10
second HASH(0x7ff0c3049bc0) 11
second HASH(0x7ff0c3049b90) 12
second HASH(0x7ff0c3049b60) 13
I'm expecting the output to be:
first 0
first 1
first 2
first 3
second 0 10
second 1 11
second 2 12
second 3 13
I modified my module:
package module;
sub add_ten {
my $count = 10;
# this passes the full array as reference
my ($hash_ref) = #_; # $hash_ref is actually %hash (yes, the % is not a typo)
my #keys = keys $hash_ref;
foreach my $ltr (sort keys $hash_ref) {
$hash_ref->{$ltr}{ten}=$count;
$count++;
}
}
1;
and the main script (needed to comment out use strict to get it to work):
#!/usr/bin/perl
use module;
#use strict;
use warnings;
my %hash;
my $count = 0;
my #array = ("a", "b", "c", "d");
for my $letter (#array) {
$hash{$letter} = $count;
$count++;
}
# need "\" to pass in hash otherwise changes
# will get lost outside of subroutine
foreach my $x (sort keys %hash) {
print "first $hash{$x}\n";
}
module::add_ten(\%hash);
foreach my $p (sort keys %hash) {
print "second $hash{$p} $hash{$p}{ten}\n";
}
But this is what I was trying to get to.
$hash_ref is a reference to %hash, so when you change the values of the elements of the hash referenced by $hash_ref, you're changing the values of the hash %hash.
That means that when you do
$hash_ref->{$ltr} = { ten => $count };
You are doing
$hash{a} = { ten => 10 };
It should be no surprise that $hash{a} no longer contains zero. You'll have to change your data structure. You could use the following:
$hash{a}{value} = 0;
$hash{a}{subhash}{ten} = 10;

How can I get this basic Perl sub program that sorts to work properly?

I am brand new to Perl. Can someone help me out and give me a tip or a solution on how to get this sorting sub program to work. I know it has something to do with how arrays are passed to sub programs. I searched online and did not find an answer that I was satisfied with... I also like the suggestions the helpful S.O. users give me too. I would like to have the program print the sorted array in the main sub program. Currently, it is printing the elements of the array #a in original order. I want the sub program to modify the array so when I print the array it is in sorted order. Any suggestions are appreciated. Of course, I want to see the simplest way to fix this.
sub sort {
my #array = #_;
my $i;
my $j;
my $iMin;
for ( $i = 0; $i < #_ - 1; $i++ ) {
$iMin = $i;
for ( $j = $i + 1; $j < #_; $j++ ) {
if ( $array[$j] < $array[$iMin] ) {
$iMin = $j;
}
}
if ( $iMin != $i ) {
my $temp = $array[$i];
$array[$i] = $array[$iMin];
$array[$iMin] = $temp;
}
}
}
Then call from a main sub program:
sub main {
my #a = (-23,3,234,-45,0,32,12,54,-10000,1);
&sort(#a);
my $i;
for ( $i = 0; $i < #a; $i++ ) {
print "$a[$i]\n";
}
}
main;
When your sub does the following assignment my #array = #_, it is creating a copy of the passed contents. Therefore any modifications to the values of #array will not effect #a outside your subroutine.
Following the clarification that this is just a personal learning exercise, there are two solutions.
1) You can return the sorted array and assign it to your original variable
sub mysort {
my #array = #_;
...
return #array;
}
#a = mysort(#a)
2) Or you can pass a reference to the array, and work on the reference:
sub mysort {
my $arrayref = shift;
...
}
mysort(\#a)
Also, it's probably a good idea to not use a sub named sort since that's that's a builtin function. Duplicating your code using perl's sort:
#a = sort {$a <=> $b} #a;
Also, the for loops inside your sub should be rewritten to utilize the last index of an #array, which is written as $#array, and the range operator .. which is useful for incrementors :
for ( my $j = $i + 1; $j <= $#array; $j++ ) {
# Or simpler:
for my $j ($i+1 .. $#array) {
And finally, because you're new, I should pass on that all your scripts should start with use strict; and use warnings;. For reasons why: Why use strict and warnings?
With very few, rare exceptions the simplest (and easiest) way to sort stuff in perl is simply to use the sort builtin.
sort takes an optional argument, either a block or a subname, which can be used to control how sort evaluates which of the two elements it is comparing at any given moment is greater.
See sort on perldoc for further information.
If you require a "natural" sort function, where you get the sequence 0, 1, 2, 3, ... instead of 0, 1, 10, 11, 12, 2, 21, 22, 3, ..., then use the perl module Sort::Naturally which is available on CPAN (and commonly available as a package on most distros).
In your case, if you need a pure numeric sort, the following will be quite sufficient:
use Sort::Naturally; #Assuming Sort::Naturally is installed
sub main {
my #a = (-23,3,234,-45,0,32,12,54,-10000,1);
#Choose one of the following
#a = sort #a; #Sort in "ASCII" ascending order
#a = sort { $b cmp $a } #a; #Sort in reverse of the above
#a = nsort #a; #Sort in "natural" order
#a = sort { ncmp($b, $a) } #a; #Reverse of the above
print "$_\n" foreach #a; #To see what you actually got
}
It is also worth mentioning the use sort 'stable'; pragma which can be used to ensure that sorting occurs using a stable algorithm, meaning that elements which are equal will not be rearranged relative to one another.
As a bonus, you should be aware that sort can be used to sort data structures as well as simple scalars:
#Assume #a is an array of hashes
#a = sort { $a->{name} cmp $b->{name} } #; #Sort #a by name key
#Sort #a by name in ascending order and date in descending order
#a = sort { $a->{name} cmp $b->{name} || $b->{date} cmp $a->{date} } #a;
#Assume #a is an array of arrays
#Sort #a by the 2nd element of the arrays it contains
#a = sort { $a->[1] cmp $b->[1] } #a;
#Assume #a is an array of VERY LONG strings
#Sort #a alphanumerically, but only care about
#the first 1,000 characters of each string
#a = sort { substr($a, 0, 1000) cmp substr($b, 0, 1000) } #a;
#Assume we want to "sort" an array without modifying it:
#Yes, the names here are confusing. See below.
my #idxs = sort { $a[$a] cmp $a[$b] } (0..$#a);
print "$a[$_]\n" foreach #idxs;
##idxs contains the indexes to #a, in the order they would have
#to be read from #a in order to get a sorted version of #a
As a final note, please remember that $a and $b are special variables in perl, which are pre-populated in the context of a sorting sub or sort block; the upshot is that if you're working with sort you can always expect $a and $b to contain the next two elements being compared, and should use them accordingly, but do NOT do my $a;, e.g., or use variables with either name in non-sort-related stuff. This also means that naming things %a or #a, or %b or #b, can be confusing -- see the final section of my example above.

In Perl, how can I iterate over multiple elements of an array?

I have a CSV file that I use split to parse into an array of N items, where N is a multiple of 3.
Is there a way i can do this
foreach my ( $a, $b, $c ) ( #d ) {}
similar to Python?
I addressed this issue in my module List::Gen on CPAN.
use List::Gen qw/by/;
for my $items (by 3 => #list) {
# do something with #$items which will contain 3 element slices of #list
# unlike natatime or other common solutions, the elements in #$items are
# aliased to #list, just like in a normal foreach loop
}
You could also import the mapn function, which is used by List::Gen to implement by:
use List::Gen qw/mapn/;
mapn {
# do something with the slices in #_
} 3 => #list;
You can use List::MoreUtils::natatime. From the docs:
my #x = ('a' .. 'g');
my $it = natatime 3, #x;
while (my #vals = $it->()) {
print "#vals\n";
}
natatime is implemented in XS so you should prefer it for efficiency. Just for illustration purposes, here is how one might implement a three element iterator generator in Perl:
#!/usr/bin/perl
use strict; use warnings;
my #v = ('a' .. 'z' );
my $it = make_3it(\#v);
while ( my #tuple = $it->() ) {
print "#tuple\n";
}
sub make_3it {
my ($arr) = #_;
{
my $lower = 0;
return sub {
return unless $lower < #$arr;
my $upper = $lower + 2;
#$arr > $upper or $upper = $#$arr;
my #ret = #$arr[$lower .. $upper];
$lower = $upper + 1;
return #ret;
}
}
}
my #list = (qw(one two three four five six seven eight nine));
while (my ($m, $n, $o) = splice (#list,0,3)) {
print "$m $n $o\n";
}
this outputs:
one two three
four five six
seven eight nine
#z=(1,2,3,4,5,6,7,8,9,0);
for( #tuple=splice(#z,0,3); #tuple; #tuple=splice(#z,0,3) )
{
print "$tuple[0] $tuple[1] $tuple[2]\n";
}
produces:
1 2 3
4 5 6
7 8 9
0
Not easily. You'd be better off making #d an array of three-element tuples, by pushing the elements onto the array as an array reference:
foreach my $line (<>)
push #d, [ split /,/, $line ];
(Except that you really ought to use one of the CSV modules from CPAN.
As of Perl v5.36 you can do exactly that:
foreach my ( $a, $b, $c ) ( #d ) { ... }
It's implemented as for_list experimental feature, so you can ignore the warning the usual way with use experimental qw(for_list);
For versions before v5.36 we'll rely on while/splice as mentioned above.

perl: iterate over a typeglob

Given a typeglob, how can I find which types are actually defined?
In my application, we user PERL as a simple configuration format.
I'd like to require() the user config file, then be able to see which variables are defined, as well as what types they are.
Code: (questionable quality advisory)
#!/usr/bin/env perl
use strict;
use warnings;
my %before = %main::;
require "/path/to/my.config";
my %after = %main::;
foreach my $key (sort keys %after) {
next if exists $before{$symbol};
local *myglob = $after{$symbol};
#the SCALAR glob is always defined, so we check the value instead
if ( defined ${ *myglob{SCALAR} } ) {
my $val = ${ *myglob{SCALAR} };
print "\$$symbol = '".$val."'\n" ;
}
if ( defined *myglob{ARRAY} ) {
my #val = #{ *myglob{ARRAY} };
print "\#$symbol = ( '". join("', '", #val) . "' )\n" ;
}
if ( defined *myglob{HASH} ) {
my %val = %{ *myglob{HASH} };
print "\%$symbol = ( ";
while( my ($key, $val) = each %val ) {
print "$key=>'$val', ";
}
print ")\n" ;
}
}
my.config:
#A = ( a, b, c );
%B = ( b=>'bee' );
$C = 'see';
output:
#A = ( 'a', 'b', 'c' )
%B = ( b=>'bee', )
$C = 'see'
$_<my.config = 'my.config'
In the fully general case, you can't do what you want thanks to the following excerpt from perlref:
*foo{THING} returns undef if that particular THING hasn't been used yet, except in the case of scalars. *foo{SCALAR} returns a reference to an anonymous scalar if $foo hasn't been used yet. This might change in a future release.
But if you're willing to accept the restriction that any scalar must have a defined value to be detected, then you might use code such as
#! /usr/bin/perl
use strict;
use warnings;
open my $fh, "<", \$_; # get DynaLoader out of the way
my %before = %main::;
require "my.config";
my %after = %main::;
foreach my $name (sort keys %after) {
unless (exists $before{$name}) {
no strict 'refs';
my $glob = $after{$name};
print "\$$name\n" if defined ${ *{$glob}{SCALAR} };
print "\#$name\n" if defined *{$glob}{ARRAY};
print "%$name\n" if defined *{$glob}{HASH};
print "&$name\n" if defined *{$glob}{CODE};
print "$name (format)\n" if defined *{$glob}{FORMAT};
print "$name (filehandle)\n" if defined *{$glob}{IO};
}
}
will get you there.
With my.config of
$JACKPOT = 3_756_788;
$YOU_CANT_SEE_ME = undef;
#OPTIONS = qw/ apple cherries bar orange lemon /;
%CREDITS = (1 => 1, 5 => 6, 10 => 15);
sub is_jackpot {
local $" = ""; # " fix Stack Overflow highlighting
"#_[0,1,2]" eq "barbarbar";
}
open FH, "<", \$JACKPOT;
format WinMessage =
You win!
.
the output is
%CREDITS
FH (filehandle)
$JACKPOT
#OPTIONS
WinMessage (format)
&is_jackpot
Printing the names takes a little work, but we can use the Data::Dumper module to take part of the burden. The front matter is similar:
#! /usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
sub _dump {
my($ref) = #_;
local $Data::Dumper::Indent = 0;
local $Data::Dumper::Terse = 1;
scalar Dumper $ref;
}
open my $fh, "<", \$_; # get DynaLoader out of the way
my %before = %main::;
require "my.config";
my %after = %main::;
We need to dump the various slots slightly differently and in each case remove the trappings of references:
my %dump = (
SCALAR => sub {
my($ref,$name) = #_;
return unless defined $$ref;
"\$$name = " . substr _dump($ref), 1;
},
ARRAY => sub {
my($ref,$name) = #_;
return unless defined $ref;
for ("\#$name = " . _dump $ref) {
s/= \[/= (/;
s/\]$/)/;
return $_;
}
},
HASH => sub {
my($ref,$name) = #_;
return unless defined $ref;
for ("%$name = " . _dump $ref) {
s/= \{/= (/;
s/\}$/)/;
return $_;
}
},
);
Finally, we loop over the set-difference between %before and %after:
foreach my $name (sort keys %after) {
unless (exists $before{$name}) {
no strict 'refs';
my $glob = $after{$name};
foreach my $slot (keys %dump) {
my $var = $dump{$slot}(*{$glob}{$slot},$name);
print $var, "\n" if defined $var;
}
}
}
Using the my.config from your question, the output is
$ ./prog.pl
#A = ('a','b','c')
%B = ('b' => 'bee')
$C = 'see'
Working code using a CPAN module that gets some of the hair out of the way, Package::Stash. As noted in my comment to gbacon's answer, this is blind to the config file doing $someval = undef but that seems to be unavoidable, and at least the other cases are caught. It also limits itself to the SCALAR, ARRAY, HASH, CODE, and IO types -- getting GLOB and FORMAT is possible but it makes the code less pretty and also creates noise in the output :)
#!perl
use strict;
use warnings;
use Package::Stash;
sub all_vars_in {
my ($package) = #_;
my #ret;
my $stash = Package::Stash->new($package);
for my $sym ($stash->list_all_package_symbols) {
for my $sigil (qw($ # % &), '') {
my $fullsym = "$sigil$sym";
push #ret, $fullsym if $stash->has_package_symbol($fullsym);
}
}
#ret;
}
my %before;
$before{$_} ++ for all_vars_in('main');
require "my.config";
for my $var (all_vars_in('main')) {
print "$var\n" unless exists $before{$var};
}
Beginning in 5.010, you can distinguish whether a SCALAR exists using the B introspection module; see Detecting declared package variables in perl
Update: example copied from that answer:
# package main;
our $f;
sub f {}
sub g {}
use B;
use 5.010;
if ( ${ B::svref_2object(\*f)->SV } ) {
say "f: Thar be a scalar tharrr!";
}
if ( ${ B::svref_2object(\*g)->SV } ) {
say "g: Thar be a scalar tharrr!";
}
1;
UPDATE:
gbacon is right. *glob{SCALAR} is defined.
Here is the output I get using your code:
Name "main::glob" used only once:
possible typo at
test_glob_foo_thing.pl line 13.
'FOO1' (SCALAR)
'FOO1' (GLOB)
'FOO2' (SCALAR)
'FOO2' (GLOB)
'_<my.config' (SCALAR)
'_<my.config' (GLOB)
This is despite FOO2 being defined as a hash, but not as a scalar.
ORIGINAL ANSWER:
If I understand you correctly, you simply need to use the defined built-in.
#!/usr/bin/env perl
use strict;
use warnings;
my %before = %main::;
require "/path/to/my.config";
my %after = %main::;
foreach my $key (sort keys %after) {
if (not exists $before{$key}) {
if(defined($after{$key}){
my $val = $after{$key};
my $what = ref($val);
print "'$key' ($what)\n";
}
}
}
I hate to ask, but instead of messing around with typeglobs, why not switch to a real configuration format? e.g. check out Config::Simple and YAML.
I wouldn't recommend messing around with typeglobs and symbol tables in normal cases (some CPAN modules do that, but only at the bottom levels of large systems - e.g. Moose in the lowest levels of Class::MOP). Perl gives you a lot of rope to work with, but that rope is also quite happy to self-noosify and self-tie-around-your-neck if you're not careful :)
See also: How do you manage configuration files in Perl?
no strict 'refs';
my $func_name = 'myfunc';
*{$func_name}{CODE}()
use strict 'refs';
If you don't mind parsing Data::Dump output, you could use it to tease out the differences.
use strict;
use warnings;
use Data::Dump qw{ dump };
my %before = %main::;
require "my.config";
my %after = %main::;
foreach my $key ( sort keys %after ) {
if ( not exists $before{$key} ) {
my $glob = $after{$key};
print "'$key' " . dump( $glob) . "\n";
}
}
Using this code with the following config file:
$FOO1 = 3;
$FOO2 = 'my_scalar';
%FOO2 = ( a=>'b', c=>'d' );
#FOO3 = ( 1 .. 5);
$FOO4 = [ 1 .. 5 ];
I believe that this output provides enough information to be able to figure out which parts of each type glob are defined:
'FOO1' do {
my $a = *main::FOO1;
$a = \3;
$a;
}
'FOO2' do {
my $a = *main::FOO2;
$a = \"my_scalar";
$a = { a => "b", c => "d" };
$a;
}
'FOO3' do {
my $a = *main::FOO3;
$a = [1 .. 5];
$a;
}
'FOO4' do {
my $a = *main::FOO4;
$a = \[1 .. 5];
$a;
}
'_<my.config' do {
my $a = *main::_<my.config;
$a = \"my.config";
$a;
}

Difference of Two Arrays Using Perl

I have two arrays. I need to check and see if the elements of one appear in the other one.
Is there a more efficient way to do it than nested loops? I have a few thousand elements in each and need to run the program frequently.
Another way to do it is to use Array::Utils
use Array::Utils qw(:all);
my #a = qw( a b c d );
my #b = qw( c d e f );
# symmetric difference
my #diff = array_diff(#a, #b);
# intersection
my #isect = intersect(#a, #b);
# unique union
my #unique = unique(#a, #b);
# check if arrays contain same members
if ( !array_diff(#a, #b) ) {
# do something
}
# get items from array #a that are not in array #b
my #minus = array_minus( #a, #b );
perlfaq4 to the rescue:
How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
Use a hash. Here's code to do both and more. It assumes that each element is unique in a given array:
#union = #intersection = #difference = ();
%count = ();
foreach $element (#array1, #array2) { $count{$element}++ }
foreach $element (keys %count) {
push #union, $element;
push #{ $count{$element} > 1 ? \#intersection : \#difference }, $element;
}
If you properly declare your variables, the code looks more like the following:
my %count;
for my $element (#array1, #array2) { $count{$element}++ }
my ( #union, #intersection, #difference );
for my $element (keys %count) {
push #union, $element;
push #{ $count{$element} > 1 ? \#intersection : \#difference }, $element;
}
You need to provide a lot more context. There are more efficient ways of doing that ranging from:
Go outside of Perl and use shell (sort + comm)
map one array into a Perl hash and then loop over the other one checking hash membership. This has linear complexity ("M+N" - basically loop over each array once) as opposed to nested loop which has "M*N" complexity)
Example:
my %second = map {$_=>1} #second;
my #only_in_first = grep { !$second{$_} } #first;
# use a foreach loop with `last` instead of "grep"
# if you only want yes/no answer instead of full list
Use a Perl module that does the last bullet point for you (List::Compare was mentioned in comments)
Do it based on timestamps of when elements were added if the volume is very large and you need to re-compare often. A few thousand elements is not really big enough, but I recently had to diff 100k sized lists.
You can try Arrays::Utils, and it makes it look nice and simple, but it's not doing any powerful magic on the back end. Here's the array_diffs code:
sub array_diff(\#\#) {
my %e = map { $_ => undef } #{$_[1]};
return #{[ ( grep { (exists $e{$_}) ? ( delete $e{$_} ) : ( 1 ) } #{ $_[0] } ), keys %e ] };
}
Since Arrays::Utils isn't a standard module, you need to ask yourself if it's worth the effort to install and maintain this module. Otherwise, it's pretty close to DVK's answer.
There are certain things you must watch out for, and you have to define what you want to do in that particular case. Let's say:
#array1 = qw(1 1 2 2 3 3 4 4 5 5);
#array2 = qw(1 2 3 4 5);
Are these arrays the same? Or, are they different? They have the same values, but there are duplicates in #array1 and not #array2.
What about this?
#array1 = qw( 1 1 2 3 4 5 );
#array2 = qw( 1 1 2 3 4 5 );
I would say that these arrays are the same, but Array::Utils::arrays_diff begs to differ. This is because Array::Utils assumes that there are no duplicate entries.
And, even the Perl FAQ pointed out by mob also says that It assumes that each element is unique in a given array. Is this an assumption you can make?
No matter what, hashes are the answer. It's easy and quick to look up a hash. The problem is what do you want to do with unique values.
Here's a solid solution that assumes duplicates don't matter:
sub array_diff {
my #array1 = #{ shift() };
my #array2 = #{ shift() };
my %array1_hash;
my %array2_hash;
# Create a hash entry for each element in #array1
for my $element ( #array1 ) {
$array1_hash{$element} = #array1;
}
# Same for #array2: This time, use map instead of a loop
map { $array_2{$_} = 1 } #array2;
for my $entry ( #array2 ) {
if ( not $array1_hash{$entry} ) {
return 1; #Entry in #array2 but not #array1: Differ
}
}
if ( keys %array_hash1 != keys %array_hash2 ) {
return 1; #Arrays differ
}
else {
return 0; #Arrays contain the same elements
}
}
If duplicates do matter, you'll need a way to count them. Here's using map not just to create a hash keyed by each element in the array, but also count the duplicates in the array:
my %array1_hash;
my %array2_hash;
map { $array1_hash{$_} += 1 } #array1;
map { $array2_hash{$_} += 2 } #array2;
Now, you can go through each hash and verify that not only do the keys exist, but that their entries match
for my $key ( keys %array1_hash ) {
if ( not exists $array2_hash{$key}
or $array1_hash{$key} != $array2_hash{$key} ) {
return 1; #Arrays differ
}
}
You will only exit the for loop if all of the entries in %array1_hash match their corresponding entries in %array2_hash. Now, you have to show that all of the entries in %array2_hash also match their entries in %array1_hash, and that %array2_hash doesn't have more entries. Fortunately, we can do what we did before:
if ( keys %array2_hash != keys %array1_hash ) {
return 1; #Arrays have a different number of keys: Don't match
}
else {
return; #Arrays have the same keys: They do match
}
You can use this for getting diffrence between two arrays
#!/usr/bin/perl -w
use strict;
my #list1 = (1, 2, 3, 4, 5);
my #list2 = (2, 3, 4);
my %diff;
#diff{ #list1 } = undef;
delete #diff{ #list2 };
You want to compare each element of #x against the element of the same index in #y, right? This will do it.
print "Index: $_ => \#x: $x[$_], \#y: $y[$_]\n"
for grep { $x[$_] != $y[$_] } 0 .. $#x;
...or...
foreach( 0 .. $#x ) {
print "Index: $_ => \#x: $x[$_], \#y: $y[$_]\n" if $x[$_] != $y[$_];
}
Which you choose kind of depends on whether you're more interested in keeping a list of indices to the dissimilar elements, or simply interested in processing the mismatches one by one. The grep version is handy for getting the list of mismatches. (original post)
n + n log n algorithm, if sure that elements are unique in each array (as hash keys)
my %count = ();
foreach my $element (#array1, #array2) {
$count{$element}++;
}
my #difference = grep { $count{$_} == 1 } keys %count;
my #intersect = grep { $count{$_} == 2 } keys %count;
my #union = keys %count;
So if I'm not sure of unity and want to check presence of the elements of array1 inside array2,
my %count = ();
foreach (#array1) {
$count{$_} = 1 ;
};
foreach (#array2) {
$count{$_} = 2 if $count{$_};
};
# N log N
if (grep { $_ == 1 } values %count) {
return 'Some element of array1 does not appears in array2'
} else {
return 'All elements of array1 are in array2'.
}
# N + N log N
my #a = (1,2,3);
my #b=(2,3,1);
print "Equal" if grep { $_ ~~ #b } #a == #b;
Not elegant, but easy to understand:
#!/usr/local/bin/perl
use strict;
my $file1 = shift or die("need file1");
my $file2 = shift or die("need file2");;
my #file1lines = split/\n/,`cat $file1`;
my #file2lines = split/\n/,`cat $file2`;
my %lines;
foreach my $file1line(#file1lines){
$lines{$file1line}+=1;
}
foreach my $file2line(#file2lines){
$lines{$file2line}+=2;
}
while(my($key,$value)=each%lines){
if($value == 1){
print "$key is in only $file1\n";
}elsif($value == 2){
print "$key is in only $file2\n";
}elsif($value == 3){
print "$key is in both $file1 and $file2\n";
}
}
exit;
__END__
Try to use List::Compare. IT has solutions for all the operations that can be performed on arrays.