if exist a hash key add the new value to existing value - perl

I have a hash structure and I want to add new value to the existing value (not update with new value ).
here is my code.
use strict;
use warnings;
my %hash;
while(<DATA>){
my $line=$_;
my ($ID)=$line=~/ID=(.*?);/;
#make a hash with ID as key
if (!exists $hash{$ID}){
$hash{$ID}= $line;
}
else{
#add $line to the existing value
}
}
for my $key(keys %hash){
print $key.":".$hash{$key}."\n";
}
__DATA__
ID=13_76; gi|386755343
ID=13_75; gi|383750074
ID=13_75; gi|208434224
ID=13_76; gi|410023515
ID=13_77; gi|499086767

else{
$hash{$ID} .= $line;
}

All you need is $hash{$ID} .= $line;. No if-elses.
If there's no key $ID in the hash it would create one and concatenate $line to empty string, resulting exactly what you need.

You should store your data in a hash of arrays:
#!/usr/bin/env perl
use strict;
use warnings;
# --------------------------------------
use charnames qw( :full :short );
use English qw( -no_match_vars ); # Avoids regex performance penalty
use Data::Dumper;
# Make Data::Dumper pretty
$Data::Dumper::Sortkeys = 1;
$Data::Dumper::Indent = 1;
# Set maximum depth for Data::Dumper, zero means unlimited
local $Data::Dumper::Maxdepth = 0;
# conditional compile DEBUGging statements
# See http://lookatperl.blogspot.ca/2013/07/a-look-at-conditional-compiling-of.html
use constant DEBUG => $ENV{DEBUG};
# --------------------------------------
my %HoA = ();
while( my $line = <DATA> ){
if( my ( $ID ) = $line =~ m{ ID \= ([^;]+) }msx ){
push #{ $HoA{$ID} }, $line;
}
}
print Dumper \%HoA;
__DATA__
ID=13_76; gi|386755343
ID=13_75; gi|383750074
ID=13_75; gi|208434224
ID=13_76; gi|410023515
ID=13_77; gi|499086767

Related

Perl, Split string into Key:Value pairs for hash with lowercase keys without temporary array

Given a string of Key:Value pairs, I want to create a lookup hash but with lowercase values for the keys. I can do so with this code
my $a="KEY1|Value1|kEy2|Value2|KeY3|Value3";
my #a = split '\|', $a;
my %b = map { $a[$_] = ( !($_ % 2) ? lc($a[$_]) : $a[$_]) } 0 .. $#a ;
The resulting Hash would look like this Dumper output:
$VAR1 = {
'key3' => 'Value3',
'key2' => 'Value2',
'key1' => 'Value1'
};
Would it be possible to directly create hash %b without using temporary array #a or is there a more efficient way to achieve the same result?
Edit: I forgot to mention that I cannot use external modules for this. It needs to be basic Perl.
You can use pairmap from List::Util to do this without an intermediate array at all.
use strict;
use warnings;
use List::Util 1.29 'pairmap';
my $str="KEY1|Value1|kEy2|Value2|KeY3|Value3";
my %hash = pairmap { lc($a) => $b } split /\|/, $str;
Note: you should never use $a or $b outside of sort (or List::Util pair function) blocks. They are special global variables for sort, and just declaring my $a in a scope can break all sorts (and List::Util pair functions) in that scope. An easy solution is to immediately replace them with $x and $y whenever you find yourself starting to use them as example variables.
Since the key-value pair has to be around the | you can use a regex
my $v = "KEY1|Value1|kEy2|Value2|KeY3|Value3";
my %h = split /\|/, $v =~ s/([^|]+) \| ([^|]+)/lc($1).q(|).$2/xger;
use strict;
use warnings;
use Data::Dumper;
my $i;
my %hash = map { $i++ % 2 ? $_ : lc } split(/\|/, 'KEY1|Value1|kEy2|Value2|KeY3|Value3');
print Dumper(\%hash);
Output:
$VAR1 = {
'key1' => 'Value1',
'key2' => 'Value2',
'key3' => 'Value3'
};
For fun, here are two additional approaches.
A cheaper one than the original (since the elements are aliased rather than copied into #_):
my %hash = sub { map { $_ % 2 ? $_[$_] : lc($_[$_]) } 0..$#_ }->( ... );
A more expensive one than the original:
my %hash = ...;
#hash{ map lc, keys(%hash) } = delete( #hash{ keys(%hash) } );
More possible solutions using regexes to do all the work, but not very pretty unless you really like regex:
use strict;
use warnings;
my $str="KEY1|Value1|kEy2|Value2|KeY3|Value3";
my %hash;
my $copy = $str;
$hash{lc $1} = $2 while $copy =~ s/^([^|]*)\|([^|]*)\|?//;
use strict;
use warnings;
my $str="KEY1|Value1|kEy2|Value2|KeY3|Value3";
my %hash;
$hash{lc $1} = $2 while $str =~ m/\G([^|]*)\|([^|]*)\|?/g;
use strict;
use warnings;
my $str="KEY1|Value1|kEy2|Value2|KeY3|Value3";
my %hash = map { my ($k, $v) = split /\|/, $_, 2; (lc($k) => $v) }
$str =~ m/([^|]*\|[^|]*)\|?/g;
Here's a solution that avoids mutating the input string, constructing a new string of the same length as the input string, or creating an intermediate array in memory.
The solution here changes the split into looping over a match statement.
#! /usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my $a="KEY1|Value1|kEy2|Value2|KeY3|Value3";
sub normalize_alist_opt {
my ($input) = #_;
my %c;
my $last_key;
while ($input =~ m/([^|]*(\||\z)?)/g) {
my $s = $1;
next unless $s ne '';
$s =~ s/\|\z//g;
if (defined $last_key) {
$c{ lc($last_key) } = $s;
$last_key = undef;
} else {
$last_key = $s;
}
}
return \%c;
}
print Dumper(normalize_alist_opt($a));
A potential solution that operates over the split directly. Perl might recognize and optimize the special case. Although based on discussions here and here, I'm not sure.
sub normalize_alist {
my ($input) = #_;
my %c;
my $last_key;
foreach my $s (split /\|/, $input) {
if (defined $last_key) {
$c{ lc($last_key) } = $s;
$last_key = undef;
} else {
$last_key = $s;
}
}
return \%c;
}

Iterate over hash of arrays from function

I have an input file:
id_1 10 15 20:a:4:c
id_2 1 5 2:2:5:c
id_3 0.4 3 12:1:4:1
id_4 18 2 9:1:0/0:1
id_5 a b c:a:foo:2
I have many files of this type that I want to parse in different programs, so I want to make a function that returns a hash with easily accessible.
I've not written a function like this before, and I'm not sure how to properly access the returned hashes. Here's what I've got so far:
Library_SO.pm
#!/urs/bin/perl
package Library_SO;
use strict;
use warnings;
sub tum_norm_gt {
my $file = shift;
open my $in, '<', $file or die $!;
my %SVs;
my %info;
while(<$in>){
chomp;
my ($id, $start, $stop, $score) = split;
my #vals = (split)[1..2];
my #score_fields = split(/:/, $score);
$SVs{$id} = [ $start, $stop, $score ];
push #{$info{$id}}, #score_fields ;
}
return (\%SVs, \%info);
}
1;
And my main script:
get_vals.pl
#!/urs/bin/perl
use Library_SO;
use strict;
use warnings;
use feature qw/ say /;
use Data::Dumper;
my $file = shift or die $!;
my ($SVs, $info) = Library_SO::tum_norm_gt($file);
print Dumper \%$SVs;
print Dumper \%$info;
# for (keys %$SVs){
# say;
# my #vals = #{$SVs{$_}}; <- line 20
# }
I call this with:
perl get_vals.pl test_in.txt
The Dumper output is what I was hoping for, but when I try to iterate over the returned hash(?) and access the values (e.g. as in the commented out section) I get:
Global symbol "%SVs" requires explicit package name at get_vals.pl line 20.
Execution of get_vals.pl aborted due to compilation errors.
Have I got this totally upside down?
Your library function returns two hashrefs. If you now want to access the values you'll have to dereference the hashref:
my ($SVs, $info) = Library_SO::tum_norm_gt($file);
#print Dumper \%$SVs;
# Easier and better readable:
print Dumper $SVs ;
#print Dumper \%$info;
# Easier and better readable:
print Dumper $info ;
for (keys %{ $SVs } ){ # Better visual derefencing
say;
my #vals = #{$SVs->{$_}}; # it is not $SVs{..} but $SVs->{...}
}

Odd number of elements in hash assignment with default

Greeting Dear Community.
I'm trying to make a sub in perl that takes a hash and a debug flag which defaults to zero. However I keep getting this error Odd number of elements in hash assignment. If I don't use the debug flag, it seems to work.
Thanks for all the help.
Code:
#!/usr/bin/perl
use strict;
use warnings;
use Getopt::Long;
use POSIX qw(strftime);
#
#file2hash : read the file in k<file_name> e.g.=kconfig & kmem into hash table
#
sub file2hash {
my ($file) = #_;
open(my $data, '<', $file) or die "Could not open '$file' $!\n";
my %HoH;
my $key;
my $value;
my $who;
my $rec;
my $field;
#while ( my $line = <$data>) {
while ( <$data>) {
#print $line;
next unless (s/^(.*?):\s*//); # / turn off editor coloring
$who = $1;
#print $who;
$rec = {};
$HoH{$who} = $rec;
for $field ( split ) {
($key, $value) = split /=/, $field;
$rec->{$key} = $value;
}
}
return %HoH;
}
#
#end file2hash
#
#
#print out hash table in k<file_name> format
#
sub hash2print{
(my %HoH,my $debug) = #_;
#my ($debug)=#_||0;
#my %HoH = shift;
#my $debug = shift || 0;
my $family;
my $role;
for $family ( keys %HoH ) {
#print "$family\n";
for $role ( keys %{ $HoH{$family} } ) {
if ($debug){
print "family:$family\n";
print "role: $role\n";
}
print "$role=$HoH{$family}{$role}";
}
print "\n";
}
}
#
#end hash2print
#
sub dispatch{
my $inc= shift;
my $config_f = shift || "kconfig";
my $memory_f = shift || "kmem";
my %h2=&file2hash($config_f);
my %m2=file2hash($memory_f);
my $today=&getDate();
print "$today\n";
print "$inc\n";
my $inc_cnt = $m2{$today}{$inc} || -999999999;
print "$inc_cnt\n";
#my %config = shift;
#my %mem = shift;
#my $event = shift;
#print $m2{$inc}{$today};
}
sub getDate{
my $date = strftime "%m/%d/%Y", localtime; # "
#print $date;
return $date;
}
my %h2=&file2hash("kconfig");
my %m2=&file2hash("kmem");
&hash2print(%h2,1);
&hash2print(%m2,1);
#print &getDate();
#my $xcnt= &dispatch("event_c3_z2");
#&dispatch("event_c3_z2");
#print $xcnt;
Test file1:
event_a1_x1: email1=ackee0000#gmail.com email2=kym018#gmail.com email1_cnt=6
event_a1_x2: email1=ackee0000#gmail.com email2=kym018#gmail.com email1_cnt=5
event_b2_y1: email1=ackee0000#gmail.com email2=kym018#gmail.com email1_cnt=4
event_b2_y2: email1=ackee0000#gmail.com email2=kym018#gmail.com email1_cnt=3
event_c3_z1: email1=ackee0000#gmail.com email2=kym018#gmail.com email1_cnt=2
event_c3_z2: email1=ackee0000#gmail.com email2=kym018#gmail.com email1_cnt=1
test file2:
201609230012: event_a1_x1=6
201609230744: event_a1_x2=5
201609230844: event_b2_y1=4
201609230342: event_b2_y2=3
201609230245: event_c3_z1=2
201609230100: event_c3_z2=1
Arguments are passed to a function as one flat list. On the other hand, a hash can be assigned a list, %h = qw(a b), where consecutive elements form key-value pairs, $h{a} is 'b'. So when a hash is the first variable that receives arguments in a function it scoops up all of them. Here is a recent post on how a function returns, where the exact same story applies.
So everything is assigned to the hash, the remaining scalar as well. Thus the hash gets one more element than intended and ends up with an odd number of elements.
A solution -- in this case, only to pass hash by reference, as in the answer by Inferno
sub hash2print{
my ($rHoH, $debug) = #_;
my %HoH = %$rHoH;
# ...
}
hash2print(\%h2, 1);
It is in principle a good idea to pass lists by reference, unless they are very short.
In general you can pass the scalar first then the hash
sub hash2print{
my ($value, %HoH) = #_;
# ...
}
hash2print(1, %h2);
But in your case this doesn't go since the $debug is optional, and if we leave it out when calling the function the first key of the hash would wind up in $value.
A few other comments
Don't use globals unless there is a non-negotiable reason for that. Declare in small scope.
You don't have to do $rec = {};, can just declare it my $rec;.
In general, don't put & in front of a function call but use just file2hash(...);
You want to pass your hashes by value to your subroutine and this creates the problem. Try to pass your hash %h2 by reference instead (notice the \ before the %):
&hash2print(\%h2, 1);
Then in your sub hash2print, you can get the hash back in the following way:
sub hash2print {
(my $hashref, my $debug) = #_;
my %HoH = %$hashref; # dereference $hashref to get back the hash
...
You can read more about references here if you don't understand the concepts behind them.

rearrange data from one column to a row

I have the below data and I need to make the second column as the header. Any help is appreciated.
Data:
IBM,Voltality,7,73894756.93897434897
IBM,Market,100,983874.34324
GOOG,Sanity,15,8932748
GOOG,Rate Jump,25,873476378.234234
MBLY,Market,340,23423423432.6783
Output:
PRODUCT|Market|Rate Jump|Sanity|Voltality
IBM|100,983874.34324|||7,73894756.93897434897
GOOG||25,873476378.234234|15,8932748|||
MBLY|340,23423423432.6783|||
Code (incomplete / not sure hot to get to the end):
#!/usr/bin/perl
use strict;
use Getopt::Long;
use warnings;
use Data::Dumper;
my $valsep = ',';
my ( %type, %keys, %ccy, %cnt, %avg );
while (<>) {
chomp;
my ( $product, $reason, $count, $lat ) = split /,/;
my $key = "$product,$reason";
if ( not exists( $type{$reason} ) ) {
$type{$reason} = $reason;
}
$ccy{$key} = $product;
$cnt{$key} = $count;
$avg{$key} = $lat;
}
close(INPUT);
print Dumper ( \%ccy );
print Dumper ( \%type );
my ( %pair, %details );
foreach my $rows ( sort keys %ccy ) {
print "the key is : $rows and $ccy{$rows}\n";
foreach my $res ( sort keys %type ) {
print "The type is : $res and $type{$res}\n";
}
}
You just need to keep track of your columns and row data when parsing the data structure.
The following demonstrates:
#!/usr/bin/perl
use strict;
use warnings;
my $fh = \*DATA;
my %columns;
my %rows;
while (<$fh>) {
chomp;
my ( $company, $col, $vals ) = split ',', $_, 3;
# Track Columns for later labeling
$columns{$col}++;
$rows{$company}{$col} = $vals;
}
my #columns = sort keys %columns;
# Header
print join( '|', 'PRODUCT', #columns ), "\n";
for my $company ( sort keys %rows ) {
print join( '|', $company, map { $_ // '' } #{ $rows{$company} }{#columns} ), "\n";
}
__DATA__
IBM,Voltality,7,73894756.93897434897
IBM,Market,100,983874.34324
GOOG,Sanity,15,8932748
GOOG,Rate Jump,25,873476378.234234
MBLY,Market,340,23423423432.6783
Outputs:
PRODUCT|Market|Rate Jump|Sanity|Voltality
GOOG||25,873476378.234234|15,8932748|
IBM|100,983874.34324|||7,73894756.93897434897
MBLY|340,23423423432.6783|||
The following code will do the job; rather than using several hashes, I've put all the data in a hash of hashes. I've put comments in the script to explain what is happening in case you are not sure. You can, of course, delete them in your script.
#!/usr/bin/perl
use warnings;
use strict;
my %market;
while (<DATA>) {
next unless /\w/;
# remove line endings
chomp;
# split line by commas -- only split into three parts
my #col = split ",", $_, 3;
# save the data as $market{col0}{col1} = col2
$market{$col[0]}{$col[1]} = $col[2];
}
# create an output file
my $outfile = 'output.txt';
open( my $fh, ">", $outfile ) or die "Could not open $outfile: $!";
my #headers = ('Market','Rate Jump','Sanity','Volatility');
# print out the header line, joined by |
print { $fh } join('|', 'PRODUCT', #headers) . "\n";
# for each product in the market data
for my $p (sort keys %market) {
# print the product name
print { $fh } join('|', $p,
# go through the headers using map (map acts like a "for" loop)
# if the relevant property exists in the market data, print it;
# if not, print nothing
map { $market{$p}{$_} // '' } #headers) . "\n";
}
# this is the input data. You might be reading yours in from a file
__DATA__
IBM,Voltality,7,73894756.93897434897
IBM,Market,100,983874.34324
GOOG,Sanity,15,8932748
GOOG,Rate Jump,25,873476378.234234
MBLY,Market,340,23423423432.6783
Output:
PRODUCT|Market|Rate Jump|Sanity|Volatility
GOOG||25,873476378.234234|15,8932748|
IBM|100,983874.34324|||7,73894756.93897434897
MBLY|340,23423423432.6783|||

slicing and array into a hasn value

What I was trying to do was combine elements[1..3] into a single array, and then make the has out of that. Then sort by keys and print out the whole thing.
#!/usr/bin/perl
my %hash ;
while ( <> ) {
#elements = split /,/, $_;
#slice_elements = #elements[1..3] ;
if ($elements[0] ne '' ) {
$hash{ $elements[0] } = $slice_elements[0];
}
}
foreach $key (sort keys %hash ) {
print "$key; $hash{$key}\n";
}
This is what I get when I print this out -
casper_mint#casper-mint-dell /tmp $ /tmp/dke /tmp/File1.csv
060001.926941; TOT
060002.029434; RTP
060002.029568; RTP
060002.126895; UL
060002.229327; RDS/A
060002.312512; EON
060002.429382; RTP
060002.585408; BCS
060002.629333; LYG
060002.712240; HBC
This is waht I want the elements of the array - element[0] is the key and element[1..3] in the value
060001.926941,TOT,86.26,86.48
060002.029434,RTP,310.0,310.66
060002.029568,RTP,310.0,310.74
060002.126895,UL,34.06,34.14
060002.229327,RDS/A,84.47,84.72
060002.312512,EON,56.88,57.04
060002.429382,RTP,310.08,310.77
060002.585408,BCS,58.96,59.06
060002.629333,LYG,46.13,46.41
060002.712240,HBC,93.06,93.23
Always include use strict; and use warnings; at the top of EVERY perl script.
What you need is to create a new anonymous array [ ] as the value to your hash. Then join the values when displaying the results:
#!/usr/bin/perl
use strict;
use warnings;
my %hash;
while (<>) {
chomp;
my #elements = split /,/, $_;
if ($elements[0] ne '' ) {
$hash{ $elements[0] } = [#elements[1..3]];
}
}
foreach my $key (sort keys %hash ) {
print join(',', $key, #{$hash{$key}}) . "\n";
}
Of course, if your data really is fixed width like that, and you're not actually doing anything with the values, there actually is no need to split and join. The following would do the same thing:
use strict;
use warnings;
print sort <>;