compare 2 arrays for intersect diff and commmon values - perl

I want to compare 2 arrays and want diff , common and intersect values but below code is not working.
No error message all I can see Array as an value although I am calling $difference[0] so I doubt if the code is correct.
sub updatedevice() {
my $n = {};
my $Device_LINK = $server->object("ICF_PersistentDataSet",$devicelinks);
my $Temp_Device_LINK = $server->object("ICF_PersistentDataSet",$tempdevicelinks);
my #current_devicelist = #{ $Device_LINK->invoke("get") };
my #temp_devicelist = #{ $Temp_Device_LINK->invoke("get") };
my %temp_list;
my %current_list;
my $size = #current_devicelist;
for ($n=0; $n < $size; $n++) {
our $device=$current_devicelist[$n][0];
DEBUG( "DEBUG: - devicelinks values $device " ); --- > able to print this value of device "ABC-DCFE41->90"
my $size = #temp_devicelist;
for ($n=0; $n < $size; $n++) {
our $tempdevicelinks=$temp_devicelist[$n][0];
DEBUG( "DEBUG: - temp plc links values $tempdevicelinks " ); --- > able to print this value of device "GHJKL-poiu->78"
my %count = ();
foreach my $device (#current_devicelist, #temp_devicelist) {
$count{$device}++;
}
my #difference = grep { $count{$_} == 1 } keys %count;
my #intersect = grep { $count{$_} == 2 } keys %count;
my #union = keys %count;
DEBUG( "DEBUG: - difference links values $difference[0] " );
DEBUG( "DEBUG: - intersect links values $intersect[0] " );
DEBUG( "DEBUG: - union links values $union[0] " );
}
}
}

The problem is that you're assigning array reference (returned from invoke to an array).
Your statement of "see 'array' as a value" is a dead giveaway that you're manipulating array references (instead of arrays) - when printed, they turn into strings like this: 'ARRAY(0x349014)'
The problem is that you're taking an array reference (a scalar), and assigning it to an array - which imposes list context on your value, and turns that scalar into a list with its only element being that scalar. Thus you simply store the array reference as the first and only element of the array - instead of storing the list of values that's being referenced like you intended.
To demonstrate:
my #current_devicelist = (1,3); # Assign real arrays
my #temp_devicelist = (2,3);
my %count = ();
foreach my $device (#current_devicelist, #temp_devicelist) {
$count{$device}++;
}
my #difference = grep { $count{$_} == 1 } keys %count;
my #intersect = grep { $count{$_} == 2 } keys %count;
my #union = keys %count;
use Data::Dumper;
print Data::Dumper->Dump([\#difference, \#intersect, \#union]
,["difference","intersect","union"]);
This prints:
$difference = [
'1',
'2'
];
$intersect = [
'3'
];
$union = [
'1',
'3',
'2'
];
Now, if you mimique what your code was doing instead by changing the first 2 lines to:
my #current_devicelist = [1,3]; # Assign reference
# Works the same as if you said
# my #current_devicelist = ([1,3]);
# or
# my $current_devicelist[0] = [1,3];
my #temp_devicelist = [2,3];
... you get:
$difference = [
'ARRAY(0x349014)',
'ARRAY(0x349114)'
];
$intersect = [];
$union = [
'ARRAY(0x349014)',
'ARRAY(0x349114)'
];
To fix your problem, you can do one of 4 things:
Simply dereference your returned array references, using #{} dereference:
my #current_devicelist = #{ $Device->invoke("get") };
my #temp_devicelist = #{ $Temp_Device->invoke("get") };
Change invoke() method - if you can - to return an array instead of array reference:
# Old code:
# return $myArrRef;
# New Code:
return #$myArrRef;
Change invoke() method - if you can - to return an array OR an arrayref based on context (using wantarray):
# Old code:
# return $myArrRef;
# New Code:
return wantarray : #$myArrRef : $myArrRef;
Change your code to use array references
my $current_devicelist = $Device->invoke("get");
my $temp_devicelist = $Temp_Device->invoke("get");
my %count = ();
foreach my $device (#$current_devicelist, #$temp_devicelist) {
$count{$device}++;
}

Related

How to parse multidimensional hash variable in Perl

I have the following multidimensional hash variable
my %billingMember ;
$billingMember{1}->{'useremail_quota'} = 10;
$billingMember{1}->{'useremail_blockedquota'} = 5;
$billingMember{2}->{'useremail_quota'} = 10;
$billingMember{2}->{'useremail_blockedquota'} = 5;
How can i parse the variable %billingMember ?
ie I need to get each value like
$billingMember{1}->{'useremail_quota'},
$billingMember{1}->{'useremail_blockedquota'} ,
$billingMember{2}->{'useremail_quota'}, ....
Here 1& 2 is just for example, it will dynamic
So i think, we need to use foreach or for
Some samples taken from http://perldoc.perl.org/perldsc.html#HASHES-OF-HASHES :
foreach $family ( keys %HoH ) {
print "$family: { ";
for $role ( keys %{ $HoH{$family} } ) {
print "$role=$HoH{$family}{$role} ";
}
print "}\n";
}
(Edit : only kept the one which will probably be useful in your case)

Create a hash of array: Displaying array reference

Below is my code(just playing with hashes) where I want to create a hash of array(keys assigning to array). But I get the output as array reference. Why is this array reference displaying?
#!/usr/bin/perl
my #result = (0,0,0);
my #operator = ('AP', 'MP', 'UP');
my %operator_res;
for ( $i = 0; $i <= $#operator; $i++ ) {
if ( $i == 2 ) {
#result = (4,5,6);
} elsif ( $i == 1 ) {
#result = (1,2,3);
}
#{$operator_res{$operator[$i]}} = #result;
}
foreach $keys (%operator_res) {
print "$keys:";
#print "#{$operator_res{$keys}}\n";
print "$operator_res{$keys}[0], $operator_res{$keys}[1], $operator_res{$keys}[2]\n";
}
Output is
UP:4, 5, 6
ARRAY(0x17212e70):, , Why is this array reference printing?
AP:0, 0, 0
ARRAY(0x17212e00):, ,
MP:1, 2, 3
ARRAY(0x17212e20):, ,
foreach $keys (%operator_res)
should be
foreach $keys (keys %operator_res)
Your foreach loop iterates over each element of %operator_res, not just over the keys. As ikagim already answered, you have to use keys to get only the keys of the hash.
If you have a look with Data::Dumper on the %operator_res the Output is:
$VAR1 = 'UP';
$VAR2 = [
4,
5,
6
];
$VAR3 = 'AP';
$VAR4 = [
0,
0,
0
];
$VAR5 = 'MP';
$VAR6 = [
1,
2,
3
];
As you see, you will always get two iterations per element: one for the key and one for the array ref.
A hash value in Perl must be a scalar. To simulate multidimensional hashes, use values that are references to hashes or arrays.
The line
#{$operator_res{$operator[$i]}} = #result;
in your question is equivalent to
$operator_res{ $operator[$i] } = [ #result ];
That is, the value associated with the key $operator[$i] at the time is a reference to a new array whose contents are the same as #result.
For many examples, read the perllol documentation.
You could use Data::Dumper to print out your data in a well formatted way:
use Data::Dumper;
print Dumper(\%operator_res);
Q: Why is this array reference printing?
A: Because of this line: print "$keys:";

Difference of Two Arrays Using Perl

I have two arrays. I need to check and see if the elements of one appear in the other one.
Is there a more efficient way to do it than nested loops? I have a few thousand elements in each and need to run the program frequently.
Another way to do it is to use Array::Utils
use Array::Utils qw(:all);
my #a = qw( a b c d );
my #b = qw( c d e f );
# symmetric difference
my #diff = array_diff(#a, #b);
# intersection
my #isect = intersect(#a, #b);
# unique union
my #unique = unique(#a, #b);
# check if arrays contain same members
if ( !array_diff(#a, #b) ) {
# do something
}
# get items from array #a that are not in array #b
my #minus = array_minus( #a, #b );
perlfaq4 to the rescue:
How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
Use a hash. Here's code to do both and more. It assumes that each element is unique in a given array:
#union = #intersection = #difference = ();
%count = ();
foreach $element (#array1, #array2) { $count{$element}++ }
foreach $element (keys %count) {
push #union, $element;
push #{ $count{$element} > 1 ? \#intersection : \#difference }, $element;
}
If you properly declare your variables, the code looks more like the following:
my %count;
for my $element (#array1, #array2) { $count{$element}++ }
my ( #union, #intersection, #difference );
for my $element (keys %count) {
push #union, $element;
push #{ $count{$element} > 1 ? \#intersection : \#difference }, $element;
}
You need to provide a lot more context. There are more efficient ways of doing that ranging from:
Go outside of Perl and use shell (sort + comm)
map one array into a Perl hash and then loop over the other one checking hash membership. This has linear complexity ("M+N" - basically loop over each array once) as opposed to nested loop which has "M*N" complexity)
Example:
my %second = map {$_=>1} #second;
my #only_in_first = grep { !$second{$_} } #first;
# use a foreach loop with `last` instead of "grep"
# if you only want yes/no answer instead of full list
Use a Perl module that does the last bullet point for you (List::Compare was mentioned in comments)
Do it based on timestamps of when elements were added if the volume is very large and you need to re-compare often. A few thousand elements is not really big enough, but I recently had to diff 100k sized lists.
You can try Arrays::Utils, and it makes it look nice and simple, but it's not doing any powerful magic on the back end. Here's the array_diffs code:
sub array_diff(\#\#) {
my %e = map { $_ => undef } #{$_[1]};
return #{[ ( grep { (exists $e{$_}) ? ( delete $e{$_} ) : ( 1 ) } #{ $_[0] } ), keys %e ] };
}
Since Arrays::Utils isn't a standard module, you need to ask yourself if it's worth the effort to install and maintain this module. Otherwise, it's pretty close to DVK's answer.
There are certain things you must watch out for, and you have to define what you want to do in that particular case. Let's say:
#array1 = qw(1 1 2 2 3 3 4 4 5 5);
#array2 = qw(1 2 3 4 5);
Are these arrays the same? Or, are they different? They have the same values, but there are duplicates in #array1 and not #array2.
What about this?
#array1 = qw( 1 1 2 3 4 5 );
#array2 = qw( 1 1 2 3 4 5 );
I would say that these arrays are the same, but Array::Utils::arrays_diff begs to differ. This is because Array::Utils assumes that there are no duplicate entries.
And, even the Perl FAQ pointed out by mob also says that It assumes that each element is unique in a given array. Is this an assumption you can make?
No matter what, hashes are the answer. It's easy and quick to look up a hash. The problem is what do you want to do with unique values.
Here's a solid solution that assumes duplicates don't matter:
sub array_diff {
my #array1 = #{ shift() };
my #array2 = #{ shift() };
my %array1_hash;
my %array2_hash;
# Create a hash entry for each element in #array1
for my $element ( #array1 ) {
$array1_hash{$element} = #array1;
}
# Same for #array2: This time, use map instead of a loop
map { $array_2{$_} = 1 } #array2;
for my $entry ( #array2 ) {
if ( not $array1_hash{$entry} ) {
return 1; #Entry in #array2 but not #array1: Differ
}
}
if ( keys %array_hash1 != keys %array_hash2 ) {
return 1; #Arrays differ
}
else {
return 0; #Arrays contain the same elements
}
}
If duplicates do matter, you'll need a way to count them. Here's using map not just to create a hash keyed by each element in the array, but also count the duplicates in the array:
my %array1_hash;
my %array2_hash;
map { $array1_hash{$_} += 1 } #array1;
map { $array2_hash{$_} += 2 } #array2;
Now, you can go through each hash and verify that not only do the keys exist, but that their entries match
for my $key ( keys %array1_hash ) {
if ( not exists $array2_hash{$key}
or $array1_hash{$key} != $array2_hash{$key} ) {
return 1; #Arrays differ
}
}
You will only exit the for loop if all of the entries in %array1_hash match their corresponding entries in %array2_hash. Now, you have to show that all of the entries in %array2_hash also match their entries in %array1_hash, and that %array2_hash doesn't have more entries. Fortunately, we can do what we did before:
if ( keys %array2_hash != keys %array1_hash ) {
return 1; #Arrays have a different number of keys: Don't match
}
else {
return; #Arrays have the same keys: They do match
}
You can use this for getting diffrence between two arrays
#!/usr/bin/perl -w
use strict;
my #list1 = (1, 2, 3, 4, 5);
my #list2 = (2, 3, 4);
my %diff;
#diff{ #list1 } = undef;
delete #diff{ #list2 };
You want to compare each element of #x against the element of the same index in #y, right? This will do it.
print "Index: $_ => \#x: $x[$_], \#y: $y[$_]\n"
for grep { $x[$_] != $y[$_] } 0 .. $#x;
...or...
foreach( 0 .. $#x ) {
print "Index: $_ => \#x: $x[$_], \#y: $y[$_]\n" if $x[$_] != $y[$_];
}
Which you choose kind of depends on whether you're more interested in keeping a list of indices to the dissimilar elements, or simply interested in processing the mismatches one by one. The grep version is handy for getting the list of mismatches. (original post)
n + n log n algorithm, if sure that elements are unique in each array (as hash keys)
my %count = ();
foreach my $element (#array1, #array2) {
$count{$element}++;
}
my #difference = grep { $count{$_} == 1 } keys %count;
my #intersect = grep { $count{$_} == 2 } keys %count;
my #union = keys %count;
So if I'm not sure of unity and want to check presence of the elements of array1 inside array2,
my %count = ();
foreach (#array1) {
$count{$_} = 1 ;
};
foreach (#array2) {
$count{$_} = 2 if $count{$_};
};
# N log N
if (grep { $_ == 1 } values %count) {
return 'Some element of array1 does not appears in array2'
} else {
return 'All elements of array1 are in array2'.
}
# N + N log N
my #a = (1,2,3);
my #b=(2,3,1);
print "Equal" if grep { $_ ~~ #b } #a == #b;
Not elegant, but easy to understand:
#!/usr/local/bin/perl
use strict;
my $file1 = shift or die("need file1");
my $file2 = shift or die("need file2");;
my #file1lines = split/\n/,`cat $file1`;
my #file2lines = split/\n/,`cat $file2`;
my %lines;
foreach my $file1line(#file1lines){
$lines{$file1line}+=1;
}
foreach my $file2line(#file2lines){
$lines{$file2line}+=2;
}
while(my($key,$value)=each%lines){
if($value == 1){
print "$key is in only $file1\n";
}elsif($value == 2){
print "$key is in only $file2\n";
}elsif($value == 3){
print "$key is in both $file1 and $file2\n";
}
}
exit;
__END__
Try to use List::Compare. IT has solutions for all the operations that can be performed on arrays.

How do I store a 2d array in a hash in Perl?

I am struggling through objects in perl, and am trying to create a 2d array and store it in a hash field of my object. I understand that to create a 2d array I need an array of references to arrays, but when I try to do it I get this error: Type of arg 1 to push must be array (not hash element) The constructor works fine, and set_seqs works fine, but my create_matrix sub is throwing these errors.
Here is what I am doing:
sub new {
my ($class) = #_;
my $self = {};
$self->{seq1} = undef;
$self->{seq2} = undef;
$self->{matrix} = ();
bless($self, $class);
return $self;
}
sub set_seqs {
my $self = shift;
$self->{seq1} = shift;
$self->{seq2} = shift;
print $self->{seq1};
}
sub create_matrix {
my $self = shift;
$self->set_seqs(shift, shift);
#create the 2d array of scores
#to create a matrix:
#create a 2d array of length [lengthofseq1][lengthofseq2]
for (my $i = 0; $i < length($self->{seq1}) - 1; $i++) {
#push a new array reference onto the matrix
#this line generates the error
push(#$self->{matrix}, []);
}
}
Any idea of what I am doing wrong?
You're missing an extra set of braces when you dereference $self. Try push #{$self->{matrix}}, [].
When in doubt (if you're not sure if you're referring to the correct value in a complicated data structure), add more braces. :) See perldoc perlreftut.
Perl is a very expressive, language. You can do that all with the statement below.
$self->{matrix} = [ map { [ (0) x $seq2 ] } 1..$seq1 ];
Is this golf? Maybe, but it also avoids mucking with the finicky push prototype. I explode the statement below:
$self->{matrix} = [ # we want an array reference
map { # create a derivative list from the list you will pass it
[ (0) x $seq2 ] # another array reference, using the *repeat* operator
# in it's list form, thus creating a list of 0's as
# long as the value given by $seq2, to fill out the
# reference's values.
}
1..$seq1 # we're not using the indexes as anything more than
# control, so, use them base-1.
]; # a completed array of arrays.
I have a standard subroutine to make tables:
sub make_matrix {
my ( $dim1, $dim2 ) = #_;
my #table = map { [ ( 0 ) x $dim2 ] } 1..$dim1;
return wantarray? #table : \#table;
}
And here's a more generalized array-of-arrays function:
sub multidimensional_array {
my $dim = shift;
return [ ( 0 ) x $dim ] unless #_; # edge case
my #table = map { scalar multidimensional_array( #_ ) } 1..$dim;
return wantarray ? #table : \#table;
}
sub create_matrix {
my($self,$seq1,$seq2) = #_;
$self->set_seqs($seq2, $seq2);
#create the 2d array of scores
#to create a matrix:
#create a 2d array of length [$seq1][$seq2]
for( 1..$seq1 ){
push #{$self->{matrix}}, [ (undef) x $seq2 ];
}
}

find extra, missing, invalid strings when comparing two lists in perl

List-1 List-2
one one
two three
three three
four four
five six
six seven
eight eighttt
nine nine
Looking to output
one | one PASS
two | * FAIL MISSING
three | three PASS
* | three FAIL EXTRA
four | four PASS
five | * FAIL MISSING
six | six PASS
* | seven FAIL EXTRA
eight | eighttt FAIL INVALID
nine | nine PASS
Actually the return from my current solution is a reference to the two modified lists and a reference to a "fail" list describing the failure for the index as either "no fail", "missing", "extra", or "invalid" which is also (obviously) fine output.
My current solution is:
sub compare {
local $thisfound = shift;
local $thatfound = shift;
local #thisorig = #{ $thisfound };
local #thatorig = #{ $thatfound };
local $best = 9999;
foreach $n (1..6) {
local $diff = 0;
local #thisfound = #thisorig;
local #thatfound = #thatorig;
local #fail = ();
for (local $i=0;$i<scalar(#thisfound) || $i<scalar(#thatfound);$i++) {
if($thisfound[$i] eq $thatfound[$i]) {
$fail[$i] = 'NO_FAIL';
next;
}
if($n == 1) { # 1 2 3
next unless __compare_missing__();
next unless __compare_extra__();
next unless __compare_invalid__();
} elsif($n == 2) { # 1 3 2
next unless __compare_missing__();
next unless __compare_invalid__();
next unless __compare_extra__();
} elsif($n == 3) { # 2 1 3
next unless __compare_extra__();
next unless __compare_missing__();
next unless __compare_invalid__();
} elsif($n == 4) { # 2 3 1
next unless __compare_extra__();
next unless __compare_invalid__();
next unless __compare_missing__();
} elsif($n == 5) { # 3 1 2
next unless __compare_invalid__();
next unless __compare_missing__();
next unless __compare_extra__();
} elsif($n == 6) { # 3 2 1
next unless __compare_invalid__();
next unless __compare_extra__();
next unless __compare_missing__();
}
push #fail,'INVALID';
$diff += 1;
}
if ($diff<$best) {
$best = $diff;
#thisbest = #thisfound;
#thatbest = #thatfound;
#failbest = #fail;
}
}
return (\#thisbest,\#thatbest,\#failbest)
}
sub __compare_missing__ {
my $j;
### Does that command match a later this command? ###
### If so most likely a MISSING command ###
for($j=$i+1;$j<scalar(#thisfound);$j++) {
if($thisfound[$j] eq $thatfound[$i]) {
$diff += $j-$i;
for ($i..$j-1) { push(#fail,'MISSING'); }
#end = #thatfound[$i..$#thatfound];
#thatfound = #thatfound[0..$i-1];
for ($i..$j-1) { push(#thatfound,'*'); }
push(#thatfound,#end);
$i=$j-1;
last;
}
}
$j == scalar(#thisfound);
}
sub __compare_extra__ {
my $j;
### Does this command match a later that command? ###
### If so, most likely an EXTRA command ###
for($j=$i+1;$j<scalar(#thatfound);$j++) {
if($thatfound[$j] eq $thisfound[$i]) {
$diff += $j-$i;
for ($i..$j-1) { push(#fail,'EXTRA'); }
#end = #thisfound[$i..$#thisfound];
#thisfound = #thisfound[0..$i-1];
for ($i..$j-1) { push (#thisfound,'*'); }
push(#thisfound,#end);
$i=$j-1;
last;
}
}
$j == scalar(#thatfound);
}
sub __compare_invalid__ {
my $j;
### Do later commands match? ###
### If so most likely an INVALID command ###
for($j=$i+1;$j<scalar(#thisfound);$j++) {
if($thisfound[$j] eq $thatfound[$j]) {
$diff += $j-$i;
for ($i..$j-1) { push(#fail,'INVALID'); }
$i=$j-1;
last;
}
}
$j == scalar(#thisfound);
}
But this isn't perfect ... who wants to simplify and improve? Specifically ... within a single data set, one order of searching is better for a subset and another order is better for a different subset.
If the arrays contain duplicate values, the answer is quite a bit more complicated than that.
See e.g. Algorithm::Diff or read about Levenshtein distance.
From perlfaq4's answer to How can I tell whether a certain element is contained in a list or array?:
(portions of this answer contributed by Anno Siegel and brian d foy)
Hearing the word "in" is an indication that you probably should have used a hash, not a list or array, to store your data. Hashes are designed to answer this question quickly and efficiently. Arrays aren't.
That being said, there are several ways to approach this. In Perl 5.10 and later, you can use the smart match operator to check that an item is contained in an array or a hash:
use 5.010;
if( $item ~~ #array )
{
say "The array contains $item"
}
if( $item ~~ %hash )
{
say "The hash contains $item"
}
With earlier versions of Perl, you have to do a bit more work. If you are going to make this query many times over arbitrary string values, the fastest way is probably to invert the original array and maintain a hash whose keys are the first array's values:
#blues = qw/azure cerulean teal turquoise lapis-lazuli/;
%is_blue = ();
for (#blues) { $is_blue{$_} = 1 }
Now you can check whether $is_blue{$some_color}. It might have been a good idea to keep the blues all in a hash in the first place.
If the values are all small integers, you could use a simple indexed array. This kind of an array will take up less space:
#primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
#is_tiny_prime = ();
for (#primes) { $is_tiny_prime[$_] = 1 }
# or simply #istiny_prime[#primes] = (1) x #primes;
Now you check whether $is_tiny_prime[$some_number].
If the values in question are integers instead of strings, you can save quite a lot of space by using bit strings instead:
#articles = ( 1..10, 150..2000, 2017 );
undef $read;
for (#articles) { vec($read,$_,1) = 1 }
Now check whether vec($read,$n,1) is true for some $n.
These methods guarantee fast individual tests but require a re-organization of the original list or array. They only pay off if you have to test multiple values against the same array.
If you are testing only once, the standard module List::Util exports the function first for this purpose. It works by stopping once it finds the element. It's written in C for speed, and its Perl equivalent looks like this subroutine:
sub first (&#) {
my $code = shift;
foreach (#_) {
return $_ if &{$code}();
}
undef;
}
If speed is of little concern, the common idiom uses grep in scalar context (which returns the number of items that passed its condition) to traverse the entire list. This does have the benefit of telling you how many matches it found, though.
my $is_there = grep $_ eq $whatever, #array;
If you want to actually extract the matching elements, simply use grep in list context.
my #matches = grep $_ eq $whatever, #array;
sub compare {
local #d = ();
my $this = shift;
my $that = shift;
my $distance = _levenshteindistance($this, $that);
my #thisorig = #{ $this };
my #thatorig = #{ $that };
my $s = $#thisorig;
my $t = $#thatorig;
#this = ();
#that = ();
#fail = ();
while($s>0 || $t>0) {
# deletion, insertion, substitution
my $min = _minimum($d[$s-1][$t],$d[$s][$t-1],$d[$s-1][$t-1]);
if($min == $d[$s-1][$t-1]) {
unshift(#this,$thisorig[$s]);
unshift(#that,$thatorig[$t]);
if($d[$s][$t] > $d[$s-1][$t-1]) {
unshift(#fail,'INVALID');
} else {
unshift(#fail,'NO_FAIL');
}
$s -= 1;
$t -= 1;
} elsif($min == $d[$s][$t-1]) {
unshift(#this,'*');
unshift(#that,$thatorig[$t]);
unshift(#fail,'EXTRA');
$t -= 1;
} elsif($min == $d[$s-1][$t]) {
unshift(#this,$thisorig[$s]);
unshift(#that,'*');
unshift(#fail,'MISSING');
$s -= 1;
} else {
die("Error! $!");
}
}
return(\#this, \#that, \#fail);
}
sub _minimum {
my $ret = 2**53;
foreach $in (#_) {
$ret = $ret < $in ? $ret : $in;
}
$ret;
}
sub _levenshteindistance {
my $s = shift;
my $t = shift;
my #s = #{ $s };
my #t = #{ $t };
for(my $i=0;$i<scalar(#s);$i++) {
$d[$i] = ();
}
for(my $i=0;$i<scalar(#s);$i++) {
$d[$i][0] = $i # deletion
}
for(my $j=0;$j<scalar(#t);$j++) {
$d[0][$j] = $j # insertion
}
for(my $j=1;$j<scalar(#t);$j++) {
for(my $i=1;$i<scalar(#s);$i++) {
if ($s[$i] eq $t[$j]) {
$d[$i][$j] = $d[$i-1][$j-1];
} else {
# deletion, insertion, substitution
$d[$i][$j] = _minimum($d[$i-1][$j]+1,$d[$i][$j-1]+1,$d[$i-1][$j-1]+1);
}
}
}
foreach $a (#d) {
#a = #{ $a };
foreach $b (#a) {
printf STDERR "%2d ",$b;
}
print STDERR "\n";
}
return $d[$#s][$#t];
}
The trick in Perl (and similar languages) is the hash, which doesn't care about order.
Suppose that the first array is the one that hold the valid elements. Construct a hash with those values as keys:
my #valid = qw( one two ... );
my %valid = map { $_, 1 } #valid;
Now, to find the invalid elements, you just have to find the ones not in the %valid hash:
my #invalid = grep { ! exists $valid{$_} } #array;
If you want to know the array indices of the invalid elements:
my #invalid_indices = grep { ! exists $valid{$_} } 0 .. $#array;
Now, you can expand that to find the repeated elements too. Not only do you check the %valid hash, but also keep track of what you have already seen:
my %Seen;
my #invalid_indices = grep { ! exists $valid{$_} && ! $Seen{$_}++ } 0 .. $#array;
The repeated valid elements are the ones with a value in %Seen that is greater than 1:
my #repeated_valid = grep { $Seen{$_} > 1 } #valid;
To find the missing elements, you look in %Seen to check what isn't in there.
my #missing = grep { ! $Seen{$_ } } #valid;
From perlfaq4's answer to How do I compute the difference of two arrays? How do I compute the intersection of two arrays?:
Use a hash. Here's code to do both and more. It assumes that each element is unique in a given array:
#union = #intersection = #difference = ();
%count = ();
foreach $element (#array1, #array2) { $count{$element}++ }
foreach $element (keys %count) {
push #union, $element;
push #{ $count{$element} > 1 ? \#intersection : \#difference }, $element;
}
Note that this is the symmetric difference, that is, all elements in either A or in B but not in both. Think of it as an xor operation.