remove an array from AOA perl - perl

I have an array of array that looks like this -
$VAR1 = [
'sid_R.ba',
'PS20TGB2YM13',
'SID_r.BA',
'ARS',
'XBUE'
]; $VAR2 = [
'sddff.pk',
'PQ10XD06K800',
'SDDFF.PK',
'USD',
'PINX'
]; $VAR3 = [
'NULL',
'NULL',
'NULL',
'.',
'XNAS'
]; $VAR4 = [
'NULL',
'NULL',
'NULL',
'.',
'XNAS'
]; $VAR5 = [
'NULL',
'NULL',
'NULL',
'EUR',
'OTCX'
]; $VAR6 = [
'sid.ba',
'PS20TGB1TN17',
'SID.BA',
'ARS',
'XBUE'
];
I want to remove the complete block (array ref) if any of its element is NULL
I have a code in which the array gets generated, so I tried a for loop to delete but then the index of the array is reduced on the inside the for loop.
So I dont know in which order the array will be or the length of array.
Please I need a generic solution.
Please help.
Thanks

You seem to have an array like
my #AoA = (
[1, 2, 3],
[4, 5, 6],
[7, 8, "NULL"],
[9, 10],
);
You want to select all child arrays that do not contain "NULL". Easy: Just use nested grep:
my #AoA_sans_NULL = grep {
not grep { $_ eq "NULL" } #$_
} #AoA;
The grep { CONDITION } #array selects all elements from #array where the CONDITION evaluates to true.
The grep { $_ eq "NULL" } #$_ counts the number of "NULL"s in the inner array. If this is zero, our condition is true, else, we don't want to keep that sub-array.

use List::MoreUtils qw(none);
my #filtered = grep {
none { $_ eq "NULL" } #$_;
} #array;

Does this do what you want?
my #new_array = grep { scalar(grep { $_ eq 'NULL' } #{$_}) == 0 } #old_array;

Old school:
my #filtered = ();
ARRAY_LOOP:
for my $array ( #AoA ){
ITEM_LOOP:
for my $item ( #$array ){
next ARRAY_LOOP if $item eq 'NULL';
} # end ITEM_LOOP
push #filtered, $array;
} # end ARRAY_LOOP

This code will be slower than the others, but an in-place solution might be useful if the data-set is very large.
use List::MoreUtils qw(any);
for(my $i = 0; $i < #AoA; $i ++) {
splice #AoA, $i --, 1
if any { $_ eq "NULL" } #{ $AoA[$i] };
}

A non-grep of a grep solution:
my #array = ...; #Array of Arrays
for my $array_index ( reverse 0 .. $#array ) {
my #inner_array = #{ $array[$array_index] };
if ( grep /^NULL$/, #inner_array ) {
splice #array, $array_index, 1;
}
}
say Dumper #array;
The splice command removes the entire subarray. I don't need to create #inner_array I could have used my dereferenced #{ $array[$array_index] } in the if statement, but I like going for clarity.
The only gotcha is that you have to go through your array of array backwards. If you go through your array from first element to last element, you'll remove element 2 which causes all the other elements to have their indexes decremented. If I first remove element 4, element 0 to 3 don't change their index.
It's not as elegant as the grep of a grep solutions, but it's a lot easier to maintain. Imagine someone who has to go through your program six months from now trying to figure out what:
grep { not grep { $_ eq "NULL" } #$_ } #array;
is doing.

Related

Remove values from array inside a hash

I have a hash %m_h with a couple of different data types inside. I want to remove the item 'q20_bases' from the array in $VAR4 but can't figure out how.
Data Structure (From print Dumper %m_h)
$VAR1 = 'run_m';
$VAR2 = [
'run_id',
'machine',
'raw_clusters',
'passed_filter_reads',
'yield'
];
$VAR3 = 'ln_m';
$VAR4 = [
'run_id',
'lane_number',
'read_number',
'length',
'passed_filter_reads',
'percent_passed_filter_clusters',
'q20_bases',
'q30_bases',
'yield',
'raw_clusters',
'raw_clusters_sd',
'passed_filter_clusters_per_tile',
'passed_filter_clusters_per_tile_sd',
'percent_align',
'percent_align_sd'
];
I tried delete $m_h{'q20_bases'}; though it did nothing and I'm not sure what direction to head in.
delete removes a key and the associated value from a hash, not an element from an array.
You can use grep to select the elements of the array that are different to q20_bases.
$m_h{ln_m} = [grep $_ ne 'q20_bases', #{ $m_h{ln_m} }];
or
#{ $m_h{ln_m} } = grep $_ ne 'q20_bases', #{ $m_h{ln_m} };
You can also use splice to remove an element from an array, but you need to know its index:
my ($i) = grep $m_h{ln_m}[$_] eq 'q20_bases', 0 .. $#{ $m_h{ln_m} };
splice #{ $m_h{ln_m} }, $i, 1;
You can see that you always need to dereference the value with #{...} to get the array from the array reference. Recent Perls also provide an alternative syntax for it:
$m_h{ln_m}->#*

Printing values of an array from an array of array references

How can I print the values of an array. I have tried several ways but I am unable to get the required values out of the arrays:
#array;
Dumper output is as below :
$VAR1 = [
'a',
'b',
'c'
];
$VAR1 = [
'd',
'e',
'f'
];
$VAR1 = [
'g',
'h',
'i'
];
$VAR1 = [
'j',
'k',
'l'
];
for my $value (#array) {
my $ip = $value->[0];
DEBUG("DEBUG '$ip\n'");
}
I am getting output as below, which means foreach instance I am only getting the first value.
a
d
g
j
I have tried several approaches :
First option :
my $size = #array;
for ($n=0; $n < $size; $n++) {
my $value=$array[$n];
DEBUG( "DEBUG: Element is as $value" );
}
Second Option :
for my $value (#array) {
my $ip = $value->[$_];
DEBUG("DEBUG Element is '$ip\n'");
}
What is the best way to do this?
It is obvious that you have list of arrays. You only loop over top list and print first (0th) value in your first example. Barring any automatic dumpers, you need to loop over both levels.
for my $value (#array) {
for my $ip (#$value) {
DEBUG("DEBUG '$ip\n'");
}
}
You want to dereference here so you need to do something like:
my #array_of_arrays = ([qw/a b c/], [qw/d e f/ ], [qw/i j k/])
for my $anon_array (#array_of_arrays) { say for #{$anon_array} }
Or using your variable names:
use strict;
use warnings;
my #array = ([qw/a b c/], [qw/d e f/], [qw/i j k/]);
for my $ip (#array) {
print join "", #{$ip} , "\n"; # or "say"
}
Since there are anonymous arrays involved I have focused on dereferencing (using PPB style!) instead of nested loops, but print for is a loop in disguise really.
Cheers.

Create a hash of array: Displaying array reference

Below is my code(just playing with hashes) where I want to create a hash of array(keys assigning to array). But I get the output as array reference. Why is this array reference displaying?
#!/usr/bin/perl
my #result = (0,0,0);
my #operator = ('AP', 'MP', 'UP');
my %operator_res;
for ( $i = 0; $i <= $#operator; $i++ ) {
if ( $i == 2 ) {
#result = (4,5,6);
} elsif ( $i == 1 ) {
#result = (1,2,3);
}
#{$operator_res{$operator[$i]}} = #result;
}
foreach $keys (%operator_res) {
print "$keys:";
#print "#{$operator_res{$keys}}\n";
print "$operator_res{$keys}[0], $operator_res{$keys}[1], $operator_res{$keys}[2]\n";
}
Output is
UP:4, 5, 6
ARRAY(0x17212e70):, , Why is this array reference printing?
AP:0, 0, 0
ARRAY(0x17212e00):, ,
MP:1, 2, 3
ARRAY(0x17212e20):, ,
foreach $keys (%operator_res)
should be
foreach $keys (keys %operator_res)
Your foreach loop iterates over each element of %operator_res, not just over the keys. As ikagim already answered, you have to use keys to get only the keys of the hash.
If you have a look with Data::Dumper on the %operator_res the Output is:
$VAR1 = 'UP';
$VAR2 = [
4,
5,
6
];
$VAR3 = 'AP';
$VAR4 = [
0,
0,
0
];
$VAR5 = 'MP';
$VAR6 = [
1,
2,
3
];
As you see, you will always get two iterations per element: one for the key and one for the array ref.
A hash value in Perl must be a scalar. To simulate multidimensional hashes, use values that are references to hashes or arrays.
The line
#{$operator_res{$operator[$i]}} = #result;
in your question is equivalent to
$operator_res{ $operator[$i] } = [ #result ];
That is, the value associated with the key $operator[$i] at the time is a reference to a new array whose contents are the same as #result.
For many examples, read the perllol documentation.
You could use Data::Dumper to print out your data in a well formatted way:
use Data::Dumper;
print Dumper(\%operator_res);
Q: Why is this array reference printing?
A: Because of this line: print "$keys:";

How can I dereference an array of arrays in Perl?

How do I dereference an array of arrays when passed to a function?
I am doing it like this:
my #a = {\#array1, \#array2, \#array3};
func(\#a);
func{
#b = #_;
#c = #{#b};
}
Actually I want the array #c should contain the addresses of #array1, #array2, and #array3.
my #a = {\#array1, \#array2, \#array3};
The above is an array with a single member -> a hash containing:
{ ''.\#array1 => \#array2, ''.\#array3 => undef }
Because as a key in the hash, Perl coerces the reference to #array1 into a string. And Perl allows a scalar hash reference to be assigned to an array, because it is "understood" that you want an array with the first element being the scalar you assigned to it.
You create an array of arrays, like so:
my #a = (\#array1, \#array2, \#array3);
And then in your function you would unpack them, like so:
sub func {
my $ref = shift;
foreach my $arr ( #$ref ) {
my #list_of_values = #$arr;
}
}
Or some variation thereof, like say a map would be the easiest expression:
my #list_of_entries = map { #$_ } #$ref;
In your example, #c as a list of addresses is simply the same thing as a properly constructed #a.
You may want to read perldoc perlreftut, perldoc perlref, and perldoc perldsc You can say:
sub func {
my $arrayref = shift;
for my $aref (#$arrayref) {
print join(", ", #$aref), "\n";
}
}
my #array1 = (1, 2, 3);
my #array2 = (4, 5, 6);
my #array3 = (7, 8, 9);
my #a = \(#array1, #array2, #array3);
func \#a;
or more compactly:
sub func {
my $arrayref = shift;
for my $aref (#$arrayref) {
print join(", ", #$aref), "\n";
}
}
func [ [1, 2, 3], [4, 5, 6], [7, 8, 9] ];
Read the perlreftut documentation.
Edit: Others point out a good point I missed at first. In the initialization of #a, you probably meant either #a = (...) (create array containing references) or $arrayref = [...] (create reference to array), not {...} (create reference to hash). The rest of this post pretends you had the #a = (...) version.
Since you pass one argument (a reference to #a) to func, #_ is a list containing that one reference. You can get that reference and then dereference it by doing:
sub func {
my $arrayref = shift;
my #c = #{$arrayref};
}
Or in one line, it would look like:
sub func {
my #c = #{shift()};
}
(If you hadn't used the backslash in func(\#a), #_ would be equal to #a, the array of three references.)
The following function is designed to take either an array or an array reference and give back a sorted array of unique values. Undefined values are removed and HASH and GLOB are left as is.
#!/usr/bin/perl
use strict; use warnings;
my #one = qw / dog rat / ;
my #two = qw / dog mice / ;
my #tre = ( "And then they said it!", "No!?? ", );
open my $H, '<', $0 or die "unable to open $0 to read";
my $dog; # to show behavior with undefined value
my %hash; $hash{pig}{mouse}=55; # to show that it leaves HASH alone
my $rgx = '(?is)dog'; $rgx = qr/$rgx/; # included for kicks
my #whoo = (
'hey!',
$dog, # undefined
$rgx,
1, 2, 99, 999, 55.5, 3.1415926535,
%hash,
$H,
[ 1, 2,
[ 99, 55, \#tre, ],
3, ],
\#one, \#two,
[ 'fee', 'fie,' ,
[ 'dog', 'dog', 'mice', 'gopher', 'piranha', ],
[ 'dog', 'dog', 'mice', 'gopher', 'piranha', ],
],
[ 1, [ 1, 2222, ['no!', 'no...', 55, ], ], ],
[ [ [ 'Rat!', [ 'Non,', 'Tu es un rat!' , ], ], ], ],
'Hey!!',
0.0_1_0_1,
-33,
);
print join ( "\n",
recursively_dereference_sort_unique_array( [ 55, 9.000005555, ], #whoo, \#one, \#whoo, [ $H ], ),
"\n", );
close $H;
exit;
sub recursively_dereference_sort_unique_array
{
# recursively dereference array of arrays; return unique values sorted. Leave HASH and GLOB (filehandles) as they are.
# 2020v10v04vSunv12h20m15s
my $sb_name = (caller(0))[3];
#_ = grep defined, #_; #https://stackoverflow.com/questions/11122977/how-do-i-remove-all-undefs-from-array
my #redy = grep { !/^ARRAY\x28\w+\x29$/ } #_; # redy==the subset that is "ready"
my #noty = grep { /^ARRAY\x28\w+\x29$/ } #_; # noty==the subset that is "not yet"
my $countiter = 0;
while (1)
{
$countiter++;
die "$sb_name: are you in an infinite loop?" if ($countiter > 99);
my #next;
foreach my $refarray ( #noty )
{
my #tmparray = #$refarray;
push #next, #tmparray;
}
#next = grep defined, #next;
my #okay= grep { !/^ARRAY\x28\w+\x29$/ } #next;
#noty = grep { /^ARRAY\x28\w+\x29$/ } #next;
push #redy, #okay;
my %hash = map { $_ => 1 } #redy; # trick to get unique values
#redy = sort keys %hash;
return #redy unless (scalar #noty);
}
}
Should be
func {
$b = shift;
}
if you're passing in a reference. Hope that helps some.

Difference of Two Arrays Using Perl

I have two arrays. I need to check and see if the elements of one appear in the other one.
Is there a more efficient way to do it than nested loops? I have a few thousand elements in each and need to run the program frequently.
Another way to do it is to use Array::Utils
use Array::Utils qw(:all);
my #a = qw( a b c d );
my #b = qw( c d e f );
# symmetric difference
my #diff = array_diff(#a, #b);
# intersection
my #isect = intersect(#a, #b);
# unique union
my #unique = unique(#a, #b);
# check if arrays contain same members
if ( !array_diff(#a, #b) ) {
# do something
}
# get items from array #a that are not in array #b
my #minus = array_minus( #a, #b );
perlfaq4 to the rescue:
How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
Use a hash. Here's code to do both and more. It assumes that each element is unique in a given array:
#union = #intersection = #difference = ();
%count = ();
foreach $element (#array1, #array2) { $count{$element}++ }
foreach $element (keys %count) {
push #union, $element;
push #{ $count{$element} > 1 ? \#intersection : \#difference }, $element;
}
If you properly declare your variables, the code looks more like the following:
my %count;
for my $element (#array1, #array2) { $count{$element}++ }
my ( #union, #intersection, #difference );
for my $element (keys %count) {
push #union, $element;
push #{ $count{$element} > 1 ? \#intersection : \#difference }, $element;
}
You need to provide a lot more context. There are more efficient ways of doing that ranging from:
Go outside of Perl and use shell (sort + comm)
map one array into a Perl hash and then loop over the other one checking hash membership. This has linear complexity ("M+N" - basically loop over each array once) as opposed to nested loop which has "M*N" complexity)
Example:
my %second = map {$_=>1} #second;
my #only_in_first = grep { !$second{$_} } #first;
# use a foreach loop with `last` instead of "grep"
# if you only want yes/no answer instead of full list
Use a Perl module that does the last bullet point for you (List::Compare was mentioned in comments)
Do it based on timestamps of when elements were added if the volume is very large and you need to re-compare often. A few thousand elements is not really big enough, but I recently had to diff 100k sized lists.
You can try Arrays::Utils, and it makes it look nice and simple, but it's not doing any powerful magic on the back end. Here's the array_diffs code:
sub array_diff(\#\#) {
my %e = map { $_ => undef } #{$_[1]};
return #{[ ( grep { (exists $e{$_}) ? ( delete $e{$_} ) : ( 1 ) } #{ $_[0] } ), keys %e ] };
}
Since Arrays::Utils isn't a standard module, you need to ask yourself if it's worth the effort to install and maintain this module. Otherwise, it's pretty close to DVK's answer.
There are certain things you must watch out for, and you have to define what you want to do in that particular case. Let's say:
#array1 = qw(1 1 2 2 3 3 4 4 5 5);
#array2 = qw(1 2 3 4 5);
Are these arrays the same? Or, are they different? They have the same values, but there are duplicates in #array1 and not #array2.
What about this?
#array1 = qw( 1 1 2 3 4 5 );
#array2 = qw( 1 1 2 3 4 5 );
I would say that these arrays are the same, but Array::Utils::arrays_diff begs to differ. This is because Array::Utils assumes that there are no duplicate entries.
And, even the Perl FAQ pointed out by mob also says that It assumes that each element is unique in a given array. Is this an assumption you can make?
No matter what, hashes are the answer. It's easy and quick to look up a hash. The problem is what do you want to do with unique values.
Here's a solid solution that assumes duplicates don't matter:
sub array_diff {
my #array1 = #{ shift() };
my #array2 = #{ shift() };
my %array1_hash;
my %array2_hash;
# Create a hash entry for each element in #array1
for my $element ( #array1 ) {
$array1_hash{$element} = #array1;
}
# Same for #array2: This time, use map instead of a loop
map { $array_2{$_} = 1 } #array2;
for my $entry ( #array2 ) {
if ( not $array1_hash{$entry} ) {
return 1; #Entry in #array2 but not #array1: Differ
}
}
if ( keys %array_hash1 != keys %array_hash2 ) {
return 1; #Arrays differ
}
else {
return 0; #Arrays contain the same elements
}
}
If duplicates do matter, you'll need a way to count them. Here's using map not just to create a hash keyed by each element in the array, but also count the duplicates in the array:
my %array1_hash;
my %array2_hash;
map { $array1_hash{$_} += 1 } #array1;
map { $array2_hash{$_} += 2 } #array2;
Now, you can go through each hash and verify that not only do the keys exist, but that their entries match
for my $key ( keys %array1_hash ) {
if ( not exists $array2_hash{$key}
or $array1_hash{$key} != $array2_hash{$key} ) {
return 1; #Arrays differ
}
}
You will only exit the for loop if all of the entries in %array1_hash match their corresponding entries in %array2_hash. Now, you have to show that all of the entries in %array2_hash also match their entries in %array1_hash, and that %array2_hash doesn't have more entries. Fortunately, we can do what we did before:
if ( keys %array2_hash != keys %array1_hash ) {
return 1; #Arrays have a different number of keys: Don't match
}
else {
return; #Arrays have the same keys: They do match
}
You can use this for getting diffrence between two arrays
#!/usr/bin/perl -w
use strict;
my #list1 = (1, 2, 3, 4, 5);
my #list2 = (2, 3, 4);
my %diff;
#diff{ #list1 } = undef;
delete #diff{ #list2 };
You want to compare each element of #x against the element of the same index in #y, right? This will do it.
print "Index: $_ => \#x: $x[$_], \#y: $y[$_]\n"
for grep { $x[$_] != $y[$_] } 0 .. $#x;
...or...
foreach( 0 .. $#x ) {
print "Index: $_ => \#x: $x[$_], \#y: $y[$_]\n" if $x[$_] != $y[$_];
}
Which you choose kind of depends on whether you're more interested in keeping a list of indices to the dissimilar elements, or simply interested in processing the mismatches one by one. The grep version is handy for getting the list of mismatches. (original post)
n + n log n algorithm, if sure that elements are unique in each array (as hash keys)
my %count = ();
foreach my $element (#array1, #array2) {
$count{$element}++;
}
my #difference = grep { $count{$_} == 1 } keys %count;
my #intersect = grep { $count{$_} == 2 } keys %count;
my #union = keys %count;
So if I'm not sure of unity and want to check presence of the elements of array1 inside array2,
my %count = ();
foreach (#array1) {
$count{$_} = 1 ;
};
foreach (#array2) {
$count{$_} = 2 if $count{$_};
};
# N log N
if (grep { $_ == 1 } values %count) {
return 'Some element of array1 does not appears in array2'
} else {
return 'All elements of array1 are in array2'.
}
# N + N log N
my #a = (1,2,3);
my #b=(2,3,1);
print "Equal" if grep { $_ ~~ #b } #a == #b;
Not elegant, but easy to understand:
#!/usr/local/bin/perl
use strict;
my $file1 = shift or die("need file1");
my $file2 = shift or die("need file2");;
my #file1lines = split/\n/,`cat $file1`;
my #file2lines = split/\n/,`cat $file2`;
my %lines;
foreach my $file1line(#file1lines){
$lines{$file1line}+=1;
}
foreach my $file2line(#file2lines){
$lines{$file2line}+=2;
}
while(my($key,$value)=each%lines){
if($value == 1){
print "$key is in only $file1\n";
}elsif($value == 2){
print "$key is in only $file2\n";
}elsif($value == 3){
print "$key is in both $file1 and $file2\n";
}
}
exit;
__END__
Try to use List::Compare. IT has solutions for all the operations that can be performed on arrays.