Argument "*" isn't numeric in array element - perl

I want to make a hash of array from a file that looks like:
xx500173:56QWER 45 A rtt34 34C
...
I would like to have a unique "key" (e.g. column1_column2)
#!/usr/bin/perl
use warnings;
use strict;
my $seq;
while(<>){
chomp;
my #line = split(/\s+/, $_);
my $key = $line[0] . "_" . $line[1]; #try to make a unique key for each entry
map { $seq->{ $_->[$key] } = [#$_[0..4]] } [ split/\s+/ ];
}
foreach my $s (keys %{$seq} ) {
print $s,": ",join( "\t", #{ $seq->{$s}} ) . "\n";
}
but I get the following error:
Argument "xx500173:56QWER_45" isn't numeric in array element
Does is it matter if key is numeric or string?

An index to an array [] should be numeric, but $key is not numeric. Assuming you want all the white-space-separated tokens as elements of your array:
use warnings;
use strict;
my $seq;
while (<DATA>) {
chomp;
my #line = split;
my $key = $line[0] . "_" . $line[1]; #try to make a unique key for each entry
$seq->{$key} = [ #line ];
}
foreach my $s ( keys %{$seq} ) {
print $s, ": ", join( "\t", #{ $seq->{$s} } ) . "\n";
}
__DATA__
xx500173:56QWER 45 A rtt34 34C
Outputs:
xx500173:56QWER_45: xx500173:56QWER 45 A rtt34 34C

You have confused yourself with the line
map { $seq->{ $_->[$key] } = [#$_[0..4]] } [ split/\s+/ ];
which is wrong because
map is an operator for translating one list into another by performing the same operation on every element of the input list, but you are ignoring the returned value
The input list is only one item long - the array reference returned by [ split/\s+/ ]
What you have written is the same as
$_ = [ split /\s+/ ];
$seq->{ $_->[$key] } = [ #$_[0..4] ];
and the problem is that $_->[$key] tries to index the anonymous array using the string $key, which is clearly wrong.
All you need here is
$seq->{$key} = [ #line[0..4] ];
and your complete program should look like this
#!/usr/bin/perl
use strict;
use warnings;
my $seq;
while ( <> ) {
chomp;
my #line = split;
$seq->{"$line[0]_$line[1]"} = [ #line[0..4] ];
}
for my $s ( keys %{$seq} ) {
printf "%s: %s\n", $s, join("\t", #{ $seq->{$s} } );
}

Related

Handling Nested Delimiters in perl

use strict;
use warnings;
my %result_hash = ();
my %final_hash = ();
Compare_results();
foreach my $key (sort keys %result_hash ){
print "$key \n";
print "$result_hash{$key} \n";
}
sub Compare_results
{
while ( <DATA> )
{
my($instance,$values) = split /\:/, $_;
$result_hash{$instance} = $values;
}
}
__DATA__
1:7802315095\d\d,7802315098\d\d;7802025001\d\d,7802025002\d\d,7802025003\d\ d,7802025004\d\d,7802025005\d\d,7802025006\d\d,7802025007\d\d
2:7802315095\d\d,7802025002\d\d,7802025003\d\d,7802025004\d\d,7802025005\d\d,7802025006\d\d,7802025007\d\d
Output
1
7802315095\d\d,7802315098\d\d;7802025001\d\d,7802025002\d\d,7802025003\d\d,7802025004\d\d,7802025005\d\d,7802025006\d\d,7802025007\d\d
2
7802315095\d\d,7802025002\d\d,7802025003\d\d,7802025004\d\d,7802025005\d\d,7802025006\d\d,7802025007\d\d
Iam trying to fetch value of each key and again trying to split the comma seperated value from result hash , if i find a semicolon in any value i would want to store the left and right values in separate hash keys.
Something like below
1.#split the value of result_hash{$key} again by , and see whether any chunk is seperated by ;
2. #every chunk without ; and value on left with ; should be stored in
#{$final_hash{"eto"}} = ['7802315095\d\d','7802315098\d\d','7802025002\d\d','7802025003\d\d','7802025004\d\d','7802025005\d\d','7802025006\d\d','7802025007\d\d'] ;
3.#Anything found on the right side of ; has to be stored in
#{$final_hash{"pro"}} = ['7802025001\d\d'] ;
Is there a way that i can handle everything in the subroutine? Can i make the code more simpler
Update :
I tried splitting the string in a single shot, but its just picking the values with semicolon and ignoring everything
foreach my $key (sort keys %result_hash ){
# print "$key \n";
# print "$result_hash{$key} \n";
my ($o,$t) = split(/,|;/, $result_hash{$key});
print "Left : $o \n";
print "Left : $t \n";
#push #{$final_hash{"eto"}}, $o;
#push #{$final_hash{"pro"}} ,$t;
}
}
My updated code after help
sub Compare_results
{
open my $fh, '<', 'Data_File.txt' or die $!;
# split by colon and further split by , and ; if any (done in insert_array)
my %result_hash = map { chomp; split ':', $_ } <$fh> ;
foreach ( sort { $a <=> $b } (keys %result_hash) )
{
($_ < 21)
? insert_array($result_hash{$_}, "west")
: insert_array($result_hash{$_}, "east");
}
}
sub insert_array()
{
my ($val,$key) = #_;
foreach my $field (split ',', $val)
{
$field =~ s/^\s+|\s+$//g; # / turn off editor coloring
if ($field !~ /;/) {
push #{ $file_data{"pto"}{$key} }, $field ;
}
else {
my ($left, $right) = split ';', $field;
push #{$file_data{"pto"}{$key}}, $left if($left ne '') ;
push #{$file_data{"ero"}{$key}}, $right if($right ne '') ;
}
}
}
Thanks
Update Added a two-pass regex, at the end
Just proceed systematically, analyze the string step by step. The fact that you need consecutive splits and a particular separation rule makes it unwieldy to do in one shot. Better have a clear method than a monster statement.
use warnings 'all';
use strict;
use feature 'say';
my (%result_hash, %final_hash);
Compare_results();
say "$_ => $result_hash{$_}" for sort keys %result_hash;
say '---';
say "$_ => [ #{$final_hash{$_}} ]" for sort keys %final_hash;
sub Compare_results
{
%result_hash = map { chomp; split ':', $_ } <DATA>;
my (#eto, #pro);
foreach my $val (values %result_hash)
{
foreach my $field (split ',', $val)
{
if ($field !~ /;/) { push #eto, $field }
else {
my ($left, $right) = split ';', $field;
push #eto, $left;
push #pro, $right;
}
}
}
$final_hash{eto} = \#eto;
$final_hash{pro} = \#pro;
return 1; # but add checks above
}
There are some inefficiencies here, and no error checking, but the method is straightforward. If your input is anything but smallish please change the above to process line by line, what you clearly know how to do. It prints
1 => ... (what you have in the question)
---
eto => [ 7802315095\d\d 7802315098\d\d 7802025002\d\d 7802025003\d\ d ...
pro => [ 7802025001\d\d ]
Note that your data does have one loose \d\ d.
We don't need to build the whole hash %result_hash for this but only need to pick the part of the line after :. I left the hash in since it is declared global so you may want to have it around. If it in fact isn't needed on its own this simplifies
sub Compare_results {
my (#eto, #pro);
while (<DATA>) {
my ($val) = /:(.*)/;
foreach my $field (split ',', $val)
# ... same
}
# assign to %final_hash, return from sub
}
Thanks to ikegami for comments.
Just for the curiosity's sake, here it is in two passes with regex
sub compare_rx {
my #data = map { (split ':', $_)[1] } <DATA>;
$final_hash{eto} = [ map { /([^,;]+)/g } #data ];
$final_hash{pro} = [ map { /;([^,;]+)/g } #data ];
return 1;
}
This picks all characters which are not , or ;, using the negated character class, [^,;]. So that is up to the first either of them, left to right. It does this globally, /g, so it keeps going through the string, collecting all fields that are "left of" , or ;. Then it cheats a bit, picking all [^,;] that are right of ;. The map is used to do this for all lines of data.
If %result_hash is needed build it instead of #data and then pull the values from it with my #values = values %hash_result and feed the map with #values.
Or, broken line by line (again, you can build %result_hash instead of taking $data directly)
my (#eto, #pro);
while (<DATA>) {
my ($data) = /:(.*)/;
push #eto, $data =~ /([^,;]+)/g;
push #pro, $data =~ /;([^,;]+)/g;
}

perl match part of string in two files

I'm using a perl script to look for matches between columns in two tab-delimited files. However for one column I only want to look for a partial match between two strings in two columns.
It concerns $row[4] of $table2 and $row{d} of $table1.
The values in $row[4] of $table2 look like this:
'xxxx'.
The values in $row{d} of $table1 look like this:
'xxxx.aaa'.
If the part before the '.' is the same, there is a match. If not, there is no match. I'm not sure how to implement this in my script. This is what I have so far. I only looks for complete matches between different columns. '...' denotes code that is not important for this question
#! /usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
local $Data::Dumper::Useqq = 1;
use Getopt::Long qw(GetOptions);
...
...
chomp( my #header_table2 = split /\t/, <$table2> );
my %lookup;
while(<$table2>){
chomp;
my #row = split(/\t/);
$lookup{ $row[0] }{ $row[1] }{ $row[4] }{ $row[5] }{ $row[6] }{ $row[7] }{ $row[8] } = [ $row[9], $row[10] ];
}
my #header = do {
my $header = <$table1>;
$header =~ s/\t?\n\z//;
split /\t/, $header;
};
print $table3 join ("\t", #header, qw/ name1 name2 /), "\n";
{
no warnings 'uninitialized';
while(<$table1>){
s/\t?\n\z//;
my %row;
#row{#header} = split /\t/;
print $table3 join ( "\t", #row{#header},
#{ $lookup{ $row{a} }{ $row{b} }{ $row{c} }{ $row{d} }{ $row{e} }{ $row{f} }{ $row{g} }
// [ "", "" ] }), "\n";
}
}
This is looking like a job for a database
The solution below isn't going to work, because you are building your %lookup hash with nine levels of keys ($row[0] .. $row[8]) , and accessing it with only seven levels ($row{a} .. $row{g}), so you're going to have to edit in the real situation
I see no reason to next your hashes so deeply. A single key formed by using join on the relevant fields will work fine and probably a little faster. I also see no reason to extract table2 fields into an array and table1 fields into a hash. An array seems fine in both cases
I've solved your immediate problem by copying each #row from table1 into array #key, and removing the last dot and anything following from the fourth element before building the $key string
In view of your history of adding a spare tab character before the newline at the end of each record, I've also added four die statements that verify the size of the header row and columns rows before continuing. You will probably need to tweak those values according to your real data
use strict;
use warnings 'all';
use Data::Dumper;
local $Data::Dumper::Useqq = 1;
use Getopt::Long qw(GetOptions);
use constant TABLE1_COLUMNS => 9;
use constant TABLE2_COLUMNS => 11;
open my $table2, '<', 'table2.txt' or die $!;
my #header_table2 = do {
my $header = <$table2>;
$header =~ s/\t?\n\z//;
split /\t/, $header;
};
die "Incorrect table 2 header count " . scalar #header_table2
unless #header_table2 == TABLE2_COLUMNS;
my %lookup;
while ( <$table2> ) {
chomp;
my #row = split /\t/;
die "Incorrect table 2 column count " . scalar #row
unless #row == TABLE2_COLUMNS;
my $key = do {
local $" = "\n";
"#row[0..8]";
};
$lookup{ $key } = [ #row[9,10] ];
}
open my $table1, '<', 'table1.txt' or die $!;
my #header = do {
my $header = <$table1>;
$header =~ s/\t?\n\z//;
split /\t/, $header;
};
die "Incorrect table 1 header count " . scalar #header
unless #header == TABLE1_COLUMNS;
open my $table3, '>', 'table3.txt' or die $!;
print $table3 join ("\t", #header, qw/ name1 name2 /), "\n";
while ( <$table1> ) {
s/\t?\n\z//;
my #row = split /\t/;
die "Incorrect table 1 column count " . scalar #row
unless #row == TABLE1_COLUMNS;
my $key = do {
my #key = #row;
$key[3] =~ s/\.[^.]*\z//;
local $" = "\n";
"#key";
};
my $lookup = $lookup{ $key } // [ "", "" ];
print $table3 join("\t", #row, #$lookup), "\n";
}
You're going to have a scoping problem because your array #row and your hash %row both exist in completely different scopes.
But if you have variables (say, $foo and $bar) and you want to know if $foo starts with the contents of $bar followed by a dot, then you can do that using a regular expression check like this:
if ($foo =~ /^$bar\./) {
# match
} else {
# no match
}

perl: how loop over a hash

I've found a lot of different answers to this question, and none seem to work (?!)
Here's what I have:
my %FORM = ["a"=>"0AD", "b"=>"johnny manziel", "c"=>"lincoln"];
#my #k = keys (%FORM);
#for my $iter (#k) { print "$iter\n"; }
#for my $key (keys %FORM) {
# print "\t";
# print $FORM{$key};
# print "\n";
#}
while ( ($key, $value) = each %FORM )
{
print "key: $key, value: $FORM{$key}\n";
}
typical output:
./testprinthash.pl
key: ARRAY(0x13a2998), value:
I always get an array instead of a key value
You want to use parenthesis ( ) when assigning to a hash, not square brackets [ ].
my %FORM = ("a"=>"0AD", "b"=>"johnny manziel", "c"=>"lincoln");
The [ ] create an ARRAY reference, which is not what you want.
Check
http://perldoc.perl.org/perlref.html
http://perldoc.perl.org/perlreftut.html

building a hash from a string

I have the below string on perl :
my $string = xyz;1;xyz;2;a;2;b;2
i want to build a hash after for this string like below :
my #array =split /;/,$string;
$hash{xyz} =(1,2);
$hash{b}=(2);
$hahs{a}=(2);
what is the perl way to do this?
my $string = "xyz;1;xyz;2;a;2;b;2";
my %hash;
push #{$hash{$1}}, $2 while $string =~ s/^(\w+);(\d+);?//g;
Actually
push #{$hash{$1}}, $2 while $string =~ m/(\w+);(\d+);?/g;
would be better, since that doesn't eat up your original string.
Assuming you want the multiple values for the same key to be an array reference, then one way to do it is like this:
my #values = split /;/, $string;
my %hash;
while( #values ) {
my $key = shift #values;
my $val = shift #values;
if ( exists $hash{$key} && !ref $hash{$key} ) {
# upgrade to arrayref
$hash{$key} = [ $hash{$key}, $val ];
} elsif ( ref $hash{$key} ) {
push #{ $hash{$key} }, $val;
} else {
$hash{$key} = $val;
}
}
With your data, this will result in a structure like
{
'a' => '2',
'b' => '2',
'xyz' => [
'1',
'2'
]
};
Drats: You have repeating keys... I wanted to do something with map or grep.
This is fairly simple to understand:
my $string = "xyz;1;xyz;2;a;2;b;2";
my #array = split /;/ => $string;
my %hash;
while (#array) {
my ($key, $value) = splice #array, 0, 2;
$hash{$key} = [] if not exists $hash{$key};
push #{$hash{$key}}, $value;
}
This program will work even if the key is not together in your string. For example, the following will work even though xyz is separated by the other value pairs:
my $string = "xyz;1;a;2;b;2;xyz;2";
I am assuming that $hash{b}=(2); means you want the value of $hash{b} to be a reference to a single member array. Is that correct?
Probably the easiest (standard) way to do this is List::MoreUtils::natatime
use List::MoreUtils qw<natatime>;
my $iter = natatime 2 => split /;/, 'xyz;1;xyz;2;a;2;b;2';
my %hash;
while ( my ( $k, $v ) = $iter->()) {
push #{ $hash{ $k } }, $v;
}
However abstracting out the parts that I would probably want to do again...
use List::MoreUtils qw<natatime>;
sub pairs {
my $iter = natatime 2 => #_;
my #pairs;
while ( my ( $k, $v ) = $iter->()) {
push #pairs, [ $k, $v ];
}
return #pairs;
}
sub multi_hash {
my %h;
push #{ $h{ $_->[0] } }, $_->[1] foreach &pairs;
return wantarray ? %h : \%h;
}
my %hash = multi_hash( split /;/, 'xyz;1;xyz;2;a;2;b;2' );

Perl - Hash of hash and columns :(

I've a set of strings with variable sizes, for example:
AAA23
AB1D1
A1BC
AAB212
My goal is have in alphabetical order and unique characters collected for COLUMNS, such as:
first column : AAAA
second column : AB1A
and so on...
For this moment I was able to extract the posts through a hash of hashes. But now, how can I sort data? Could I for each hash of hash make a new array?
Thank you very much for you help!
Al
My code:
#!/usr/bin/perl
use strict;
use warnings;
my #sessions = (
"AAAA",
"AAAC",
"ABAB",
"ABAD"
);
my $length_max = 0;
my $length_tmp = 0;
my %columns;
foreach my $string (#sessions){
my $l = length($string);
if ($l > $length_tmp){
$length_max = $l;
}
}
print "max legth : $length_max\n\n";
my $n = 1;
foreach my $string (#sessions){
my #ch = split("",$string);
for my $col (1..$length_max){
$columns{$n}{$col} = $ch[$col-1];
}
$n++;
}
foreach my $col (keys %columns) {
print "colonna : $col\n";
my $deref = $columns{$col};
foreach my $pos (keys %$deref){
print " posizione : $pos --> $$deref{$pos}\n";
}
print "\n";
}
exit(0);
What you're doing is rotating the array. It doesn't need a hash of hash or anything, just another array. Surprisingly, neither List::Util nor List::MoreUtils supplies one. Here's a straightforward implementation with a test. I presumed you want short entries filled in with spaces so the columns come out correct.
#!/usr/bin/perl
use strict;
use warnings;
use Test::More;
use List::Util qw(max);
my #Things = qw(
AAA23
AB1D1
A1BC
AAB212
);
sub rotate {
my #rows = #_;
my $maxlength = max map { length $_ } #rows;
my #columns;
for my $row (#rows) {
my #chars = split //, $row;
for my $colnum (1..$maxlength) {
my $idx = $colnum - 1;
$columns[$idx] .= $chars[$idx] || ' ';
}
}
return #columns;
}
sub print_columns {
my #columns = #_;
for my $idx (0..$#columns) {
printf "Column %d: %s\n", $idx + 1, $columns[$idx];
}
}
sub test_rotate {
is_deeply [rotate #_], [
"AAAA",
"AB1A",
"A1BB",
"2DC2",
"31 1",
" 2",
];
}
test_rotate(#Things);
print_columns(#Things);
done_testing;
You can sort the output of %columns in your code with
foreach my $i (sort { $a <=> $b } keys %columns) {
print join(" " => sort values %{ $columns{$i} }), "\n";
}
This gives
A A A A
A A A C
A A B B
A A B D
But using index numbers as hash keys screams that you should use an array instead, so let's do that. To get the columns, use
sub columns {
my #strings = #_;
my #columns;
while (#strings) {
push #columns => [ sort map s/^(.)//s ? $1 : (), #strings ];
#strings = grep length, #strings;
}
#columns;
}
Given the strings from your question, it returns
A A A A
1 A A B
1 A B B
2 2 C D
1 1 3
2
As you can see, this is unsorted and repeats characters. With Perl, when you see the word unique, always think of hashes!
sub unique_sorted_columns {
map { my %unique;
++$unique{$_} for #$_;
[ sort keys %unique ];
}
columns #_;
}
If you don't mind destroying information, you can have columns sort and filter duplicates:
sub columns {
my #strings = #_;
my #columns;
while (#strings) {
my %unique;
map { ++$unique{$1} if s/^(.)//s } #strings;
push #columns => [ sort keys %unique ];
#strings = grep length, #strings;
}
#columns;
}
Output:
A
1 A B
1 A B
2 C D
1 3
2