reading columns from Hash of Arrays - perl

I'm new in perl and have a question concerning the use of hashes of arrays to retrieve specific columns. My code is the following:
my %hash = ( name1 => ['A', 'A', 'B', 'A', 'A', 'B'],
name2 => ['A', 'A', 'D', 'A', 'A', 'B'],
name3 => ['A', 'A', 'B', 'A', 'A', 'C'],
);
#the values of %hash are returned as arrays not as string (as I want)
foreach my $name (sort keys %hash ) {
print "$name: ";
print "$hash{$name}[2]\n";
}
for (my $i=0; $i<$length; $i++) {
my $diff = "no";
my $letter = '';
foreach $name (sort keys %hash) {
if (defined $hash{$name}[$i]) {
if ($hash{$name}[$i] =~ /[ABCD]/) {
$letter = $hash{$name}[$i];
}
elsif ($hash{$name}[$i] ne $letter) {
$diff = "yes";
}
}
if ( $diff eq "yes" ) {
foreach $name (sort keys %hash) {
if (defined $hash{$name}[$i]) { $newhash{$name} .= $hash{$name}[$i]; }
}
}
}
}
foreach $name (sort keys %newhash ) {
print "$name: $newhash{$name} \n";
}
I want the output of this program to be something like a new hash with only the variable columns:
my %newhash = ( name1 => 'BB',
name2 => 'DB',
name3 => 'BC',
);
but is only given this message:
Use of uninitialized value $letter in string ne at test_hash.pl line 31.
Does anyone have ideas about this?
Cheers
EDIT:
Many thanks for your help in this question.
I edited my post to confirm with the suggestions of frezik, Dan1111, Jean. You're right, now there are no warnings but I can not also get any output from the print statement and I don't have any clue about this...
#TLP: ok I just generate a random set of columns without any order purpose. What I really want is about how the letters vary, which means that if for the same array index (stored in the hash) the letters are the same, discard those, but if the letters are different between keys, I want to store that index column in a new hash.
Cheers.

I assume that by this, you want to match any of the letters A,B,C, or D:
if ($hash{$name}[$i] =~ /ABCD/)
However, as written, it matches the exact string "ABCD". You need a character class for what you want:
if ($hash{$name}[$i] =~ /[ABCD]/)
However, you have other logic problems as well, that can lead you to compare to $letter before it has been set. Setting it to empty (as Jean suggested) is a simple option that may help.
Another problem is here:
print "$name: #{ $newhash{$name} }\n";
%newhash is not a hash of arrays, so you need to remove the array dereference:
print "$name: $newhash{$name} \n";

You may be interested in this alternative solution
use strict;
use warnings;
my %hash = (
name1 => ['A', 'A', 'B', 'A', 'A', 'B'],
name2 => ['A', 'A', 'D', 'A', 'A', 'B'],
name3 => ['A', 'A', 'B', 'A', 'A', 'C'],
);
my #columns;
for my $list (values %hash) {
$columns[$_]{$list->[$_]}++ for 0 .. $#$list;
}
my %newhash = %hash;
for my $list (values %newhash) {
$list = join '', map $list->[$_], grep keys %{$columns[$_]} > 1, 0 .. $#$list;
}
use Data::Dump;
dd \%newhash;
output
{ name1 => "BB", name2 => "DB", name3 => "BC" }

I think it's a mistake to check the letters one by one. It seems easier to just collect all the letters and check them at once. The List::MoreUtils module's uniq function can then quickly determine if the letters vary, and they can be transposed into the resulting hash easily.
use strict;
use warnings;
use Data::Dumper;
use List::MoreUtils qw(uniq);
my %hash = ( name1 => ['A', 'A', 'B', 'A', 'A', 'B'],
name2 => ['A', 'A', 'D', 'A', 'A', 'B'],
name3 => ['A', 'A', 'B', 'A', 'A', 'C'],
);
my #keys = keys %hash;
my $len = $#{ $hash{$keys[0]} }; # max index
my %new;
for my $i (0 .. $len) {
my #col;
for my $key (#keys) {
push #col, $hash{$key}[$i];
}
if (uniq(#col) != 1) { # check for variation
for (0 .. $#col) {
$new{$keys[$_]} .= $col[$_];
}
}
}
print Dumper \%new;
Output:
$VAR1 = {
'name2' => 'DB',
'name1' => 'BB',
'name3' => 'BC'
};

Your scalar $letter is not defined. Add this to get rid of the warning.
my $letter='';

if ($hash{$name}[$i] =~ /ABCD/) {
The regex above would match a string like __ABCD__ or ABCD1234, but never a lone A or B. You probably wanted to match any one of those letters, and it's a good idea to anchor the regex, too:
if ($hash{$name}[$i] =~ /\A [ABCD] \z/x) {
(The /x option means that whitespace is ignored, which helps make regexes a bit easier to read.)
You would still get the warning in the example above when $i == 2 and the inner loop happens to hit the keys name1 or name3 first. Since the regex doesn't match T, $letter will remain uninitialized.

Great. Many thanks for all your help in this question.
I tried a code based on the suggestion of TLP and worked just fine. Because I'm relatively new in perl I thought this code was more easier for me to understand than the code of Borodin. What I did was:
#!/usr/bin/perl
use strict;
use warnings;
use List::MoreUtils qw(uniq);
my %hash = ( name1 => ['A', 'A', 'T', 'A', 'A', 'T', 'N', 'd', 'd', 'D', 'C', 'T', 'T', 'T'],
name2 => ['A', 'A', 'D', 'A', 'A', 'T', 'A', 'd', 'a', 'd', 'd', 'T', 'T', 'C'],
name3 => ['A', 'A', 'T', 'A', 'A', 'C', 'A', 'd', 'd', 'D', 'C', 'T', 'C', 'T'],
);
my #keys = keys %hash;
my $len = $#{ $hash{$keys[0]} }; # max index
my %new;
for (my $i=0; $i<$length; $i++) {
my #col;
for my $key (#keys) {
if ($hash{$key}[$i] =~ /[ABCDT]/) { #added a pattern match
push #col, $hash{$key}[$i];
}
}
if (uniq(#col) != 1) { # check for variation
for (0 .. $#col) {
$new{$keys[$_]} .= $col[$_];
}
}
}
foreach my $key (sort keys %new ) {
print "$key: $new{$key}\n";
}
However, when playing with the uniq function (if (uniq(#col) == 1)), I noticed that the output was a little buggy:
name1: AAAAADCT
name2: AAAAADCT
name3: AAAAT
It seems that is not preserving the initial order of keys => values. Does anyone has a hint about this?
Cheers.

Related

create an array from a part of a hash in perl

I have a hash which contain sub hash, I want to abstract that sub hash separately and create a array from that,
hash look like
'a1' => '1',
'a2' => '2'.
'Def' => [
'd' => 'x',
'e' => 'y'
]
I need to make a separate hash for 'Def'. and print only 'Def' as a array
Its hard from reading your question to know just exactly what you are trying to achieve but my interpretation of it is that you want to extract the anonymous hash allocated to def and store it in another hash. Then you want to print this hash as an array. I have also included examples to print just the keys of the values of the hash.
use strict;
use Data::Dumper;
my %first_hash = (
a1 => '1',
a2 => '2',
def => {
d => 'x',
e => 'y'
}
);
my %second_hash = %{$first_hash{'def'}};
my #full_array = %second_hash;
my #keys_array = keys %second_hash;
my #values_array = values %second_hash;
print Dumper (\%first_hash);
print Dumper (\%second_hash);
print "full array: ", join(' ',#full_array), "\n";
print "keys array: ", join(' ',#keys_array), "\n";
print "values array: ", join(' ',#values_array), "\n";
OUTPUT
$VAR1 = {
'a2' => '2',
'def' => {
'e' => 'y',
'd' => 'x'
},
'a1' => '1'
};
$VAR1 = {
'e' => 'y',
'd' => 'x'
};
full array: e y d x
keys array: e d
values array: y x
Below you'll find the answer.
print "#{$a{'Def'}}";

Printing values of an array from an array of array references

How can I print the values of an array. I have tried several ways but I am unable to get the required values out of the arrays:
#array;
Dumper output is as below :
$VAR1 = [
'a',
'b',
'c'
];
$VAR1 = [
'd',
'e',
'f'
];
$VAR1 = [
'g',
'h',
'i'
];
$VAR1 = [
'j',
'k',
'l'
];
for my $value (#array) {
my $ip = $value->[0];
DEBUG("DEBUG '$ip\n'");
}
I am getting output as below, which means foreach instance I am only getting the first value.
a
d
g
j
I have tried several approaches :
First option :
my $size = #array;
for ($n=0; $n < $size; $n++) {
my $value=$array[$n];
DEBUG( "DEBUG: Element is as $value" );
}
Second Option :
for my $value (#array) {
my $ip = $value->[$_];
DEBUG("DEBUG Element is '$ip\n'");
}
What is the best way to do this?
It is obvious that you have list of arrays. You only loop over top list and print first (0th) value in your first example. Barring any automatic dumpers, you need to loop over both levels.
for my $value (#array) {
for my $ip (#$value) {
DEBUG("DEBUG '$ip\n'");
}
}
You want to dereference here so you need to do something like:
my #array_of_arrays = ([qw/a b c/], [qw/d e f/ ], [qw/i j k/])
for my $anon_array (#array_of_arrays) { say for #{$anon_array} }
Or using your variable names:
use strict;
use warnings;
my #array = ([qw/a b c/], [qw/d e f/], [qw/i j k/]);
for my $ip (#array) {
print join "", #{$ip} , "\n"; # or "say"
}
Since there are anonymous arrays involved I have focused on dereferencing (using PPB style!) instead of nested loops, but print for is a loop in disguise really.
Cheers.

How do I loop through a hash?

Given the following variable:
$test = {
'1' => 'A',
'2' => 'B',
'3' => 'C',
'4' => 'G',
'5' => 'K',
}
How can loop through all assignments without knowing which keys I have?
I would like to fill a select box with the results as label and the keys as hidden values.
Just do a foreach loop on the keys:
#!/usr/bin/perl
use strict;
use warnings;
my $test = {
'1' => 'A',
'2' => 'B',
'3' => 'C',
'4' => 'G',
'5' => 'K',
};
foreach my $key(keys %$test) {
print "key=$key : value=$test->{$key}\n";
}
output:
key=4 : value=G
key=1 : value=A
key=3 : value=C
key=2 : value=B
key=5 : value=K
You can use the built-in function each:
while (my ($key, $value) = each %$test) {
print "key: $key, value: $value\n";
}
You can find out what keys you have with keys
my #keys = keys %$test; # Note that you need to dereference the hash here
Or you could just do the whole thing in one pass:
print map { "<option value='$_'>$test->{$_}</option>" } keys %$test;
But you'd probably want some kind of order:
print map { "<option value='$_'>$test->{$_}</option>" } sort keys %$test;
… and you'd almost certainly be better off moving the HTML generation out to a separate template system.

Perl, convert numerically-keyed hash to array

If I have a hash in Perl that contains complete and sequential integer mappings (ie, all keys from from 0 to n are mapped to something, no keys outside of this), is there a means of converting this to an Array?
I know I could iterate over the key/value pairs and place them into a new array, but something tells me there should be a built-in means of doing this.
You can extract all the values from a hash with the values function:
my #vals = values %hash;
If you want them in a particular order, then you can put the keys in the desired order and then take a hash slice from that:
my #sorted_vals = #hash{sort { $a <=> $b } keys %hash};
If your original data source is a hash:
# first find the max key value, if you don't already know it:
use List::Util 'max';
my $maxkey = max keys %hash;
# get all the values, in order
my #array = #hash{0 .. $maxkey};
Or if your original data source is a hashref:
my $maxkey = max keys %$hashref;
my #array = #{$hashref}{0 .. $maxkey};
This is easy to test using this example:
my %hash;
#hash{0 .. 9} = ('a' .. 'j');
# insert code from above, and then print the result...
use Data::Dumper;
print Dumper(\%hash);
print Dumper(\#array);
$VAR1 = {
'6' => 'g',
'3' => 'd',
'7' => 'h',
'9' => 'j',
'2' => 'c',
'8' => 'i',
'1' => 'b',
'4' => 'e',
'0' => 'a',
'5' => 'f'
};
$VAR1 = [
'a',
'b',
'c',
'd',
'e',
'f',
'g',
'h',
'i',
'j'
];
OK, this is not very "built in" but works. It's also IMHO preferrable to any solution involving "sort" as it's faster.
map { $array[$_] = $hash{$_} } keys %hash; # Or use foreach instead of map
Otherwise, less efficient:
my #array = map { $hash{$_} } sort { $a<=>$b } keys %hash;
Perl does not provide a built-in to solve your problem.
If you know that the keys cover a particular range 0..N, you can leverage that fact:
my $n = keys(%hash) - 1;
my #keys_and_values = map { $_ => $hash{$_} } 0 .. $n;
my #just_values = #hash{0 .. $n};
This will leave keys not defined in %hashed_keys as undef:
# if we're being nitpicky about when and how much memory
# is allocated for the array (for run-time optimization):
my #keys_arr = (undef) x scalar %hashed_keys;
#keys_arr[(keys %hashed_keys)] =
#hashed_keys{(keys %hashed_keys)};
And, if you're using references:
#{$keys_arr}[(keys %{$hashed_keys})] =
#{$hashed_keys}{(keys %{$hashed_keys})};
Or, more dangerously, as it assumes what you said is true (it may not always be true … Just sayin'!):
#keys_arr = #hashed_keys{(sort {$a <=> $b} keys %hashed_keys)};
But this is sort of beside the point. If they were integer-indexed to begin with, why are they in a hash now?
As DVK said, there is no built in way, but this will do the trick:
my #array = map {$hash{$_}} sort {$a <=> $b} keys %hash;
or this:
my #array;
keys %hash;
while (my ($k, $v) = each %hash) {
$array[$k] = $v
}
benchmark to see which is faster, my guess would be the second.
#a = #h{sort { $a <=> $b } keys %h};
Combining FM's and Ether's answers allows one to avoid defining an otherwise unnecessary scalar:
my #array = #hash{ 0 .. $#{[ keys %hash ]} };
The neat thing is that unlike with the scalar approach, $# works above even in the unlikely event that the default index of the first element, $[, is non-zero.
Of course, that would mean writing something silly and obfuscated like so:
my #array = #hash{ $[ .. $#{[ keys %hash ]} }; # Not recommended
But then there is always the remote chance that someone needs it somewhere (wince)...
$Hash_value =
{
'54' => 'abc',
'55' => 'def',
'56' => 'test',
};
while (my ($key,$value) = each %{$Hash_value})
{
print "\n $key > $value";
}
We can write a while as below:
$j =0;
while(($a1,$b1)=each(%hash1)){
$arr[$j][0] = $a1;
($arr[$j][1],$arr[$j][2],$arr[$j][3],$arr[$j][4],$arr[$j][5],$arr[$j][6]) = values($b1);
$j++;
}
$a1 contains the key and
$b1 contains the values
In the above example i have Hash of array and the array contains 6 elements.
An easy way is to do #array = %hash
For example,
my %hash = (
"0" => "zero",
"1" => "one",
"2" => "two",
"3" => "three",
"4" => "four",
"5" => "five",
"6" => "six",
"7" => "seven",
"8" => "eight",
"9" => "nine",
"10" => "ten",
);
my #array = %hash;
print "#array"; would produce the following output,
3 three 9 nine 5 five 8 eight 2 two 4 four 1 one 10 ten 7 seven 0 zero
6 six

Reading sections from a file in Perl

I am trying to read values from an input file in Perl.
Input file looks like:
1-sampledata1 This is a sample test
and data for this continues
2-sampledata2 This is sample test 2
Data for this also is on second line
I want to read the above data so that data for 1-sampledata1 goes into #array1 and data for 2-sampledata2 goes in #array2 and so on.
I will have about 50 sections like this. like 50-sampledata50.
EDIT: The names wont always be X-sampledataX. I just did that for example. So names cant be in a loop. I think I'll have to type them manually
I so far have the following (which works). But I am looking for a more efficient way to do this..
foreach my $line(#body){
if ($line=~ /^1-sampledata1\s/){
$line=~ s/1-ENST0000//g;
$line=~ s/\s+//g;
push (#array1, $line);
#using splitarray because i want to store data as one character each
#for ex: i wana store 'This' as T H I S in different elements of array
#splitarray1= split ('',$line);
last if ($line=~ /2-sampledata2/);
}
}
foreach my $line(#body){
if ($line=~ /^2-sampledata2\s/){
$line=~ s/2-ENSBTAP0//g;
$line=~ s/\s+//g;
#splitarray2= split ('',$line);
last if ($line=~ /3-sampledata3/);
}
}
As you can see I have different arrays for each section and different for loops for each section. If I go with approach I have so far then I will end up with 50 for loops and 50 arrays.
Is there another better way to do this? In the end I do want to end up with 50 arrays but do not want to write 50 for loops. And since I will be looping through the 50 arrays later on in the program, maybe store them in an array? I am new to Perl so its kinda overwhelming ...
The first thing to notice is that you are trying to use variable names with integer suffixes: Don't. Use an array whenever you find your self wanting to do that. Second, you only need to read to go over the file contents once, not multiple times. Third, there is usually no good reason in Perl to treat a string as an array of characters.
Update: This version of the code uses existence of leading spaces to decide what to do. I am leaving the previous version up as well for reference.
#!/usr/bin/perl
use strict;
use warnings;
my #data;
while ( my $line = <DATA> ) {
chomp $line;
if ( $line =~ s/^ +/ / ) {
push #{ $data[-1] }, split //, $line;
}
else {
push #data, [ split //, $line ];
}
}
use Data::Dumper;
print Dumper \#data;
__DATA__
1-sampledata1 This is a sample test
and data for this continues
2-sampledata2 This is sample test 2
Data for this also is on second line
Previous version:
#!/usr/bin/perl
use strict;
use warnings;
my #data;
while ( my $line = <DATA> ) {
chomp $line;
$line =~ s/\s+/ /g;
if ( $line =~ /^[0-9]+-/ ) {
push #data, [ split //, $line ];
}
else {
push #{ $data[-1] }, split //, $line;
}
}
use Data::Dumper;
print Dumper \#data;
__DATA__
1-sampledata1 This is a sample test
and data for this continues
2-sampledata2 This is sample test 2
Data for this also is on second line
#! /usr/bin/env perl
use strict;
use warnings;
my %data;
{
my( $key, $rest );
while( my $line = <> ){
unless( ($rest) = $line =~ /^ \s+(.*)/x ){
($key, $rest) = $line =~ /^(.*?)\s+(.*)/;
}
push #{ $data{$key} }, $rest;
}
}
The code below is very similar to #Brad Gilbert's and #Sinan Unur's solutions:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my (%arrays, $label);
while (my $line = <DATA>)
{
($label, $line) = ($1, $2) if $line =~ /^(\S+)(.*)/; # new data block
$line =~ s/^\s+//; # strip whitespaces from the begining
# append data for corresponding label
push #{$arrays{$label}}, split('', $line) if defined $label;
}
print $arrays{'1-sampledata1'}[2], "\n"; # 'i'
print join '-', #{$arrays{'2-sampledata2'}}; # 'T-h-i-s- -i-s- -s-a-m-p-l
print Dumper \%arrays;
__DATA__
1-sampledata1 This is a sample test
and data for this continues
2-sampledata2 This is sample test 2
Data for this also is on second line
Output
i
T-h-i-s- -i-s- -s-a-m-p-l-e- -t-e-s-t- -2-D-a-t-a- -f-o-r- -t-h-i-s- -a-l-s-o- -i-s- -o-n- -s-e-c-o-n-d- -l-i-n-e-
$VAR1 = {
'2-sampledata2' => [
'T',
'h',
'i',
's',
' ',
'i',
's',
' ',
's',
'a',
'm',
'p',
'l',
'e',
' ',
't',
'e',
's',
't',
' ',
'2',
'D',
'a',
't',
'a',
' ',
'f',
'o',
'r',
' ',
't',
'h',
'i',
's',
' ',
'a',
'l',
's',
'o',
' ',
'i',
's',
' ',
'o',
'n',
' ',
's',
'e',
'c',
'o',
'n',
'd',
' ',
'l',
'i',
'n',
'e',
'
'
],
'1-sampledata1' => [
'T',
'h',
'i',
's',
' ',
'i',
's',
' ',
'a',
' ',
's',
'a',
'm',
'p',
'l',
'e',
' ',
't',
'e',
's',
't',
'a',
'n',
'd',
' ',
'd',
'a',
't',
'a',
' ',
'f',
'o',
'r',
' ',
't',
'h',
'i',
's',
' ',
'c',
'o',
'n',
't',
'i',
'n',
'u',
'e',
's',
'
'
]
};
You should, instead, use a hash map to arrays.
Use this regex pattern to get the index:
/^(\d+)-sampledata(\d+)/
And then, with my %arrays, do:
push($arrays{$index}), $line;
You can then access the arrays with $arrays{$index}.