Perl How to build a Dynamic multi level hash lookup - perl

I have a code block that i use many times with slight variations that i am trying to make into a subroutine.
This code block Completes configuration templates ( router interface, vrf, other network stuff)
It does so by looking up data in a hash data structure ( called %config_hash) that is built from ingesting a excel file :P. The data that is looked up is in different areas of the hash for different templates.
an example of the current working code is this:
my #temp_source_template = #{ clone ($source_template{$switch_int_template}) };
my %regex_replacements=(); ## hash for holding regex search and replace values, keys are !name! (look in template files) values taken from DCAP
my #regex_key =(); ## temp array used for whe more then one !name! on a line
my $find_string='';
foreach my $line (#temp_source_template){
my (#regex_key) = ( $line =~ /(\!.*?\!)/g ); ## match needs to be non greedy thus .*? not .*
foreach my $hash_refs (#regex_key){
my $lookup = $hash_refs =~ s/!//gri; ## remove ! from !name! so lookup can be done in DCAP file hash
my $excel_lookup = $lookup =~ s/_/ /gri;
$regex_replacements{$hash_refs} = $config_hash{'Vlan'}{$inner}{$excel_lookup}; ## lookup DCAP file hash a write value to regex hash
if (undef eq $regex_replacements{$hash_refs}){
$regex_replacements{$hash_refs} = $config_hash{'Switch'}{$outer}{$excel_lookup};
}
if (undef eq $regex_replacements{$hash_refs}){
$regex_replacements{$hash_refs} = $config_hash{'VRF'}{$middle}{$excel_lookup};
}
$find_string= $find_string . $hash_refs . '|' ;
}
}
So this creates a hash (regex_replacements) that contains values to lookup (hash keys in regex_replacements) and values to replace those with (values in regex_replacements). it also builds a string to be used in a regex expression ( $find_string). Different templates will have different hash lookup "paths" ( eg $config_hash{'Switch'}{$outer}{$excel_lookup} ) or in different orders (effectively a most specific match)
for completeness here is the code block that does the regex replacements:
foreach my $line (#temp_source_template){
my (#line_array) = split /(![A-Za-z_]*!)/, $line;
foreach my $chunk (#line_array){
my $temp_chunk = $chunk;
$chunk =~ s/($find_string)/$regex_replacements{$1}/gi;
if (!($chunk)){
$chunk = $temp_chunk;
}
}
$line = join ("", #line_array);
if ($line =~ /\!.*\!/){
print {$log} " ERROR line has unmatched variables deleting line \"$line\"\n";
$line ="";
}
}
So I did some searching and i found this:
Perl: How to turn array into nested hash keys
Which is almost exactly what i want but i can't get it to work because my Variable reference is a Hash and its hash variable reference is just "REF" so i get errors for trying to use a hash as a reference.
So I wont post what i have tried as i don't really understand the magic of that link.
But what i am doing is passing to the sub the following
my #temp_source_template = #{ clone ($source_template{$test}) };
my #search_array = ( ['VRF' ,$middle] , ['Switch' ,$outer]);
my $find_string, $completed_template = dynamic_regex_replace_fine_string_gen(\%source_config,\#temp_source_template, \#search_array);
and i want returned the $find_string and the regex_replacements hash ref. It should be noted that in the sub i need to append onto the end of the elements of #search array the value of $excel_lookup.
The bit that i dont understand how to do is build the variable level hash lookup.

You could try use Data::Diver it provides a simple access to elements of deeply nested structures.
For example:
use feature qw(say);
use strict;
use warnings;
use Data::Diver qw(Dive);
my $hash = { a => { b => 1, c => { d => 2 }}};
my $keys = [[ 'a', 'b'], ['a','c','d']];
lookup_keys( $hash, $keys );
sub lookup_keys {
my ( $hash, $keys ) = #_;
for my $key ( #$keys ) {
my $value = Dive( $hash, #$key );
say $value;
}
}
Output:
1
2
See Also:
Creating hash of hash dynamically in perl
Read config hash-like data into perl hash

Related

Accessing a multi-dimensional hash using strings

I have a large multi-dimensional hash which is an import of a JSON structure.
my %bighash;
There is an element in %bighash called:
$bighash{'core'}{'dates'}{'year'} = 2019.
I have a separate string variable called core.dates.year which I would like to use to extract 2019 from %bighash.
I've written this code:
my #keys = split(/\./, 'core.dates.year');
my %hash = ();
my $hash_ref = \%hash;
for my $key ( #keys ){
$hash_ref->{$key} = {};
$hash_ref = $hash_ref->{$key};
}
which when I execute:
say Dumper \%hash;
outputs:
$VAR1 = {
'core' => {
'dates' => {
'year' => {}
}
}
};
All good so far. But what I now want to do is say:
print $bighash{\%hash};
Which I want to return 2019. But nothing is being returned or I'm seeing an error about "Use of uninitialized value within %bighash in concatenation (.) or string at script.pl line 1371, line 17 (#1)...
Can someone point me into what is going on?
My project involves embedding strings in an external file which is then replaced with actual values from %bighash so it's just string interpolation.
Thanks!
Can someone point me into what is going on [when I use $bighash{\%hash}]?
Hash keys are strings, and the stringification of \%hash is something like HASH(0x655178). The only element in %bighash has core —not HASH(0x655178)— for key, so the hash lookup returns undef.
Useful tools:
sub dive_val :lvalue { my $p = \shift; $p //= \( $$p->{$_} ) for #_; $$p } # For setting
sub dive { my $r = shift; $r //= $r->{$_} for #_; $r } # For getting
dive_val(\%hash, split /\./, 'core.dates.year') = 2019;
say dive(\%hash, split /\./, 'core.dates.year');
Hash::Fold would seem to be helpful here. You can "flatten" your hash and then access everything with a single key.
use Hash::Fold 'flatten';
my $flathash = flatten(\%bighash, delimiter => '.');
print $flathash->{"core.dates.year"};
There are no multi-dimensional hashes in Perl. Hashes are key/value pairs. Your understanding of Perl data structures is incomplete.
Re-imagine your data structure as follows
my %bighash = (
core => {
dates => {
year => 2019,
},
},
);
There is a difference between the round parentheses () and the curly braces {}. The % sigil on the variable name indicates that it's a hash, that is a set of unordered key/value pairs. The round () are a list. Inside that list are two scalar values, i.e. a key and a value. The value is a reference to another, anonymous, hash. That's why it has curly {}.
Each of those levels is a separate, distinct data structure.
This rewrite of your code is similar to what ikegami wrote in his answer, but less efficient and more verbose.
my #keys = split( /\./, 'core.dates.year' );
my $value = \%bighash;
for my $key (#keys) {
$value //= $value->{$key};
}
print $value;
It drills down step by step into the structure and eventually gives you the final value.

Interpolating a non-interpolated passed string inside a subroutine in Perl

I am looking to parse a tab delimited text file into a nested hash with a subroutine. Each file row will be keyed by a unique id from a uid column(s), with the header row as nested keys. Which column(s) is(are) to become the uid changes (as sometimes there isn't a unique column, so the uid has to be a combination of columns). My issue is with the $uid variable, which I pass as a non-interpolated string. When I try to use it inside the subroutine in an interpolated way, it will only give me the non-interpolated value:
use strict;
use warnings;
my $lofrow = tablehash($lof_file, '$row{gene}', "transcript", "ENST");
##sub to generate table hash from file w/ headers
##input values are file, uid, header starter, row starter, max column number
##returns hash reference (deref it)
sub tablehash {
my ($file, $uid, $headstart, $rowstart, $colnum) = #_;
if (!$colnum){ # takes care of a unknown number of columns
$colnum = 0;
}
open(INA, $file) or die "failed to open $file, $!\n";
my %table; # permanent hash table
my %row; # hash of column values for each row
my #names = (); # column headers
my #values = (); # line/row values
while (chomp(my $line = <INA>)){ # reading lines for lof info
if ($line =~ /^$headstart/){
#names = split(/\t/, $line, $colnum);
} elsif ($line =~ /^$rowstart/){ # splitting lof info columns into variables
#values = split(/\t/, $line, $colnum);
#row{#names} = #values;
print qq($uid\t$row{gene}\n); # problem: prints "$row{gene} ACB1"
$table{"$uid"} = { %row }; # puts row hash into permanent hash, but with $row{gene} key)
}
}
close INA;
return \%table;
}
I am out of ideas. I could put $table{$row{$uid}} and simply pass "gene", but in a couple of instances I want to have a $uid of "$row{gene}|$row{rsid}" producing $table{ACB1|123456}
Interpolation is a feature of the Perl parser. When you write something like
"foo $bar baz"
, Perl compiles it into something like
'foo ' . $bar . ' $baz'
It does not interpret data at runtime.
What you have is a string where one of the characters happens to be $ but that has no special effect.
There are at least two possible ways to do something like what you want. One of them is to use a function, not a string. (Which makes sense because interpolation really means concatenation at runtime, and the way to pass code around is to wrap it in a function.)
my $lofrow = tablehash($lof_file, sub { my ($row) = #_; $row->{gene} }, "transcript", "ENST");
sub tablehash {
my ($file, $mkuid, $headstart, $rowstart, $colnum) = #_;
...
my $uid = $mkuid->(\%row);
$table{$uid} = { %row };
Here $mkuid isn't a string but a reference to a function that (given a hash reference) returns a uid string. tablehash calls it, passing a reference to %row to it. You can then later change it to e.g.
my $lofrow = tablehash($lof_file, sub { my ($row) = #_; "$row->{gene}|$row->{rsid}" }, "transcript", "ENST");
Another solution is to use what amounts to a template string:
my $lofrow = tablehash($lof_file, "gene|rsid", "transcript", "ENST");
sub tablehash {
my ($file, $uid_template, $headstart, $rowstart, $colnum) = #_;
...
(my $uid = $uid_template) =~ s/(\w+)/$row{$1}/g;
$table{$uid} = { %row };
The s/// code goes through the template string and manually replaces every word by the corresponding value from %row.
Random notes:
Bonus points for using strict and warnings.
if (!$colnum) { $colnum = 0; } can be simplified to $colnum ||= 0;.
Use lexical variables instead of bareword filehandles. Barewords are effectively global variables (and syntactically awkward because they're not first-class citizens of the language).
Always use the 3-argument form of open to avoid unexpected interpretation of the second argument.
Include the name of your program in error messages (either explicitly with $0 or implicitly by omitting \n from die).
my #foo = (); my %bar = (); is redundant and can be simplified to my #foo; my %bar;. Arrays and hashes start out empty; overwriting them with an empty list is pointless.
chomp(my $line = <INA>) will throw a warning when you reach EOF (because you're trying to chomp a variable containing undef).
my %row; should probably be declared inside the loop. It looks like it's supposed to only contain values from the current line.
Suggestion:
open my $fh, '<', $file or die "$0: can't open $file: $!\n";
while (my $line = readline $fh) {
chomp $line;
...
}

Parse report in blocks to CSV

I have lots of data dumps in a pretty huge amount of data structured as follow
Key1:.............. Value
Key2:.............. Other value
Key3:.............. Maybe another value yet
Key1:.............. Different value
Key3:.............. Invaluable
Key5:.............. Has no value at all
Which I would like to transform to something like:
Key1,Key2,Key3,Key5
Value,Other value,Maybe another value yet,
Different value,,Invaluable,Has no value at all
I mean:
Generate a collection of all the keys
Generate a header line with all the Keys
Map all the values to their correct "columns" (notice that in this example I have no "Key4", and Key3/Key5 interchanged)
Possibly in Perl, since it would be easier to use in various environments.
But I am not sure if this format is unusual, or if there is a tool that already does this.
This is fairly easy using hashes and the Text::CSV_XS module:
use strict;
use warnings;
use Text::CSV_XS;
my #rows;
my %headers;
{
local $/ = "";
while (<DATA>) {
chomp;
my %record;
for my $line (split(/\n/)) {
next unless $line =~ /^([^:]+):\.+\s(.+)/;
$record{$1} = $2;
$headers{$1} = $1;
}
push(#rows, \%record);
}
}
unshift(#rows, \%headers);
my $csv = Text::CSV_XS->new({binary => 1, auto_diag => 1, eol => $/});
$csv->column_names(sort(keys(%headers)));
for my $row_ref (#rows) {
$csv->print_hr(*STDOUT, $row_ref);
}
__DATA__
Key1:.............. Value
Key2:.............. Other value
Key3:.............. Maybe another value yet
Key1:.............. Different value
Key3:.............. Invaluable
Key5:.............. Has no value at all
Output:
Key1,Key2,Key3,Key5
Value,"Other value","Maybe another value yet",
"Different value",,Invaluable,"Has no value at all"
If your CSV format is 'complicated' - e.g. it contains commas, etc. - then use one of the Text::CSV modules. But if it isn't - and this is often the case - I tend to just work with split and join.
What's useful in your scenario, is that you can map key-values within a record quite easily using a regex. Then use a hash slice to output:
#!/usr/bin/env perl
use strict;
use warnings;
#set paragraph mode - records are blank line separated.
local $/ = "";
my #rows;
my %seen_header;
#read STDIN or files on command line, just like sed/grep
while ( <> ) {
#multi - line pattern, that matches all the key-value pairs,
#and then inserts them into a hash.
my %this_row = m/^(\w+):\.+ (.*)$/gm;
push ( #rows, \%this_row );
#add the keys we've seen to a hash, so we 'know' what we've seen.
$seen_header{$_}++ for keys %this_row;
}
#extract the keys, make them unique and ordered.
#could set this by hand if you prefer.
my #header = sort keys %seen_header;
#print the header row
print join ",", #header, "\n";
#iterate the rows
foreach my $row ( #rows ) {
#use a hash slice to select the values matching #header.
#the map is so any undefined values (missing keys) don't report errors, they
#just return blank fields.
print join ",", map { $_ // '' } #{$row}{#header},"\n";
}
This for you sample input, produces:
Key1,Key2,Key3,Key5,
Value,Other value,Maybe another value yet,,
Different value,,Invaluable,Has no value at all,
If you want to be really clever, then most of that initial building of the loop can be done with:
my #rows = map { { m/^(\w+):\.+ (.*)$/gm } } <>;
The problem then is - you would need to build up the 'headers' array still, and that means a bit more complicated:
$seen_header{$_}++ for map { keys %$_ } #rows;
It works, but I don't think it's as clear about what's happening.
However the core of your problem may be the file size - that's where you have a bit of a problem, because you need to read the file twice - first time to figure out which headings exist throughout the file, and then second time to iterate and print:
#!/usr/bin/env perl
use strict;
use warnings;
open ( my $input, '<', 'your_file.txt') or die $!;
local $/ = "";
my %seen_header;
while ( <$input> ) {
$seen_header{$_}++ for m/^(\w+):/gm;
}
my #header = sort keys %seen_header;
#return to the start of file:
seek ( $input, 0, 0 );
while ( <$input> ) {
my %this_row = m/^(\w+):\.+ (.*)$/gm;
print join ",", map { $_ // '' } #{$this_row}{#header},"\n";
}
This will be slightly slower, as it'll have to read the file twice. But it won't use nearly as much memory footprint, because it isn't holding the whole file in memory.
Unless you know all your keys in advance, and you can just define them, you'll have to read the file twice.
This seems to work with the data you've given
use strict;
use warnings 'all';
my %data;
while ( <> ) {
next unless /^(\w+):\W*(.*\S)/;
push #{ $data{$1} }, $2;
}
use Data::Dump;
dd \%data;
output
{
Key1 => ["Value", "Different value"],
Key2 => ["Other value"],
Key3 => ["Maybe another value yet", "Invaluable"],
Key5 => ["Has no value at all"],
}

Perl list all keys in hash with identical values

If I have a colon-delimited file name FILE and I do:
cat FILE|perl -F: -lane 'my %hash = (); $hash{#F[0]} = #F[2]'
to assign the first and 3rd tokens as the key => value pairs for the hash..
1) Is that a sane way to assign key value pairs to a hash?
2) What is the simplest way to now find all keys with shared values and list them?
Assume FILE looks like:
Mike:34:Apple:Male
Don:23:Corn:Male
Jared:12:Apple:Male
Beth:56:Maize:Female
Sam:34:Apple:Male
David:34:Apple:Male
Desired Output: Keys with value "Apple": Mike,Jared,David,Sam
Your example won't work as you want because the -n option puts a while loop around your one-line program, so the hash you declare is created and destoyed for every record in the file. You could get around that by not declaring the hash, and so making it a persistent package variable which will retain all values stored in it.
You can then write push #{ $hash{$F[2]} }, $F[0] but notice that it should be $F[0] etc. and not #F[0], and I have used push to create a list of column 1 values for each column 3 value instead of just a list of one-to-one values relating each column 1 value with its column 3 value.
To clarify, your method produces a hash looking like this, which has to be searched to produce the display that you want.
(
Beth => "Maize",
David => "Apple",
Don => "Corn",
Jared => "Apple",
Mike => "Apple",
Sam => "Apple",
)
while mine creates this, which as you can see is pretty much already in the form you want.
(
Apple => ["Mike", "Jared", "Sam", "David"],
Corn => ["Don"],
Maize => ["Beth"],
)
But I think this problem is a bit too big to be solved with a one-line Perl program. The solution below expects the path to the input file as a command-line parameter, like this
> perl prog.pl colons.csv
but it will default to myfile.csv if no file is specified.
use strict;
use warnings;
our #ARGV = 'myfile.csv' unless #ARGV;
my %data;
while (<>) {
my #fields = split /:/;
push #{ $data{$fields[2]} }, $fields[0];
}
while (my ($k, $v) = each %data) {
next unless #$v > 1;
printf qq{Keys with value "%s": %s\n}, $k, join ', ', #$v;
}
output
Keys with value "Apple": Mike, Jared, Sam, David
use strict;
use warnings;
open my $in, '<', 'in.txt';
my %data;
while(<$in>){
chomp;
my #split = split/:/;
$data{$split[0]} = $split[2];
}
my $query = 'Apple';
print "Keys with value $query = ";
foreach my $name (keys %data){
print "$name " if $data{$name} eq $query;
}
print "\n";
Arrays are used to hold list of values, so use an array.
perl -F: -lane'
push #{ $h{$F[2]} }, $F[0];
END {
for my $fruit (keys %h) {
next if #{ $h{$fruit} } < 2;
print "$fruit: ", join(",", #{ $h{$fruit} });
}
}
' FILE
The END block is executed on exit. In it, we iterate over the keys of the hash. If the value of the current hash element is an array with only one element, it's skipped. Otherwise, we prints the key followed by contents of the array referenced by the hash element.
Here is another way:
perl -F: -lane'
push #{ $h{$F[2]} }, $F[0];
}{
print "$_: ", join(",", #{ $h{$_} }) for grep { #{$h{$_}} > 1 } keys %h;
' file
We read each line and create hash of arrays using third column as key and first column as list of values for matching key. In the END block we iterate over our hash using grep and filter keys whose array count greater than 1 and print the key followed by array elements.
It doesn't have to be a one liner,
Good. It's not going to be...
Is that a sane way to assign key value pairs to a hash?
You're simply assigning the key value pairs as:
$hash{"key"} = "value";
Which is about as simple as it gets. There might be a way of doing it via map. However, the main issue I see is what should happen if you have duplicate keys.
Let's say your file looks like this:
Mike:34:Apple:Male
Don:23:Corn:Male
Jared:12:Apple:Male
Beth:56:Maize:Female
Sam:34:Apple:Male
David:34:Apple:Male # Note this entry is here twice!
David:35:Wheat:Male # Note this entry is here twice!
Let's do a simple assignment loop:
my %hash;
while my $line ( <$fh> ) {
chomp $line;
my ($name, $age, $category, $sex) = split /:/, $line;
$hash{$name} = $category;
}
When you get to $hash{David}, it will first be set to Apple, but then you change the value to Wheat. There are four ways you can handle this:
Use whatever the last value is. No change in the loop.
Use the first value and ignore subsequent values. Simple enough to do.
If that happens, it's an error. Abort the program and report the error.
Keep all values.
This last one is the most interesting because it involves a reference to an array as the values for your hash:
my %hash;
while my $line ( <$fh> ) {
chomp $line;
my ($name, $age, $category, $sex) = split /:/, $line;
$hash{$name} = [] if not exists $hash{$name}; # I'm making this an array reference
push #{ $hash{$name} }, $category;
}
Now, each value in my hash is a reference to an array:
my #values = #{ $hash{David} ); # The values of David...
print "David is in categories " . join ( ", ", #values ) . "\n";
This will print out David is in categories Wheat, Apple
What is the simplest way to now find all keys with shared values and list them?
The easiest way is to create a second hash that's keyed by your value. In this hash, you will need to use an array reference. Let's assume no duplicate names for now:
my %hash;
my %indexed_hash;
while my $line ( <$fh> ) {
chomp $line;
my ($name, $age, $category, $sex) = split /:/, $line;
$hash{$name} = $category;
my $indexed_hash{$category} = [] if not exist $indexed_hash{$category};
push #{ $indexed_hash{$category} }, $name;
}
Now, if I want to find all the duplicates of Apple:
my #names = #{ $indexed_hash{Apple} };
print "The following are in 'Apple': " . join ( ", " #names ) . "\n";
Since we're getting into references, we could take things a step further and store all of your values of your file in your hash. Again, for simplicity, I am assuming that you will have one and only one entry per name:
my %hash;
while my $line ( <$fh> ) {
chomp $line;
my ($name, $age, $category, $sex) = split /:/, $line;
$hash{$name}->{AGE} = $age;
$hash{$name}->{CATEGORY} = $category;
$hash{$name}->{SEX} = $sex;
}
for my $name ( sort keys %hash ) {
print "$name Information:\n";
print " Age: " . $hash{$name}->{AGE} . "\n";
printf "Category: %s\n", $hash{$name}->{CATEGORY};
print " Sex: #{[$hash{$name}->{SEX}]}\n\n";
}
That last two statements are easier ways of interpolating complex data structures into a string. The printf is fairly clear. The second #{[...]} is a neat little trick.
What have you tried?
If you reverse the hash into a list of value => key pairs then use List::Util's pairs() against the list, you can transform the hash into a hash of values => key arrayrefs. i.e. ( foo => [ 'bar', 'baz' ] ), grep {#{$hash{$_}} > 1} keys %hash, and print the results.

Selectively counting delimited field values and creating a hash using map

I have a pipe delimited text file containing, among other things, a date and a number indicating the lines sequence elsewhere in the program. What I'm hoping to do is from that file create a hash using the year as the key and the value being the maximum sequence for that year (I essentially need to implement an auto-incremented key per year) e.g from
2000|1
2003|9
2000|5
2000|21
2003|4
I would end with a hash like:
%hash = {
2000 => 21,
2003 => 9
}
I've managed to split the file into the year and sequence parts (not very well I think) like so:
my #dates = map {
my #temp = split /\|/;
join "|", (split /\//, $temp[1])[-1], $temp[4] || 0; #0 because some records
#mightn't have a sequence
} #info
Is there something succint I could do to create a hash using that data?
Thanks
If I understand you, you were almost there. All you needed to do was return the key and value from map and sort by sequence instead of joining them.
my %hash =
map #$_,
sort { $a->[1] <=> $b->[1] }
map {
my #temp = split /\|/;
my $date = (split /\//, $temp[1])[-1];
my $seq = $temp[4] || 0; #0 because some records mightn't have a sequence
[ $date, $seq ]
} #info;
But just iterating through with for and setting hash only if the current sequence
is higher than the previous maximum for that date is probably a better idea.
Be careful with those {}; where you said
%hash = {
2000 => 21,
2003 => 9
}
you meant () instead (or to be assigning to a reference $hash), since the {} there create an anonymous hash and return a reference to it.
Here's how you could write that .. not too sure why you want/need to use map (please explain)
#!/usr/bin/perl -w
use strict;
use warnings;
my %hash;
while(<DATA>) {
chomp();
my ($year,$sequence)=split('\|');
$sequence = 0 unless (defined ($sequence));
next if (exists $hash{$year} and $sequence < $hash{$year});
$hash{$year}=$sequence;
}
__DATA__
2000|1
2003|9
2000|5
2000|21
2003|4
I added the $sequence = 0 unless defined ($sequence); because of that comment in your snippet. I believe I might understand your intent there.. (either the input format is valid/consistent, or it is not ..)
map operates on each item in a list and builds a list of results to pass on. So, you can't really do the sort of checks you want (keep the maximum sequence value) as you go unless you build a scratch hash that winds up containing exactly the data you are trying to build as the return value of the `map.
my %results = map {
my( $y, $s ) = split '[|]', $_;
seq_is_gt_year_seq( $y, $s )
? ( $y, $s )
: ();
} #year_pipe_seq;
To implement seq_is_gt_year_seq() we wind up having to build a temporary hash that stores each year and its max sequence value for lookup.
You should use an approach that builds the lookup incrementally, like a for or while loop.
map { BLOCK } LIST always usually (unless BLOCK sometimes evaluates to an empty list) returns a list that is least as large as LIST, and may not be the way to go if you do want to simply overwrite duplicate keys with the latest data. Something like:
my %hash;
for (#info) {
my #temp = split /\|/;
my $key = (split /\//, $temp[1]);
my $value = $temp[4] || 0;
$hash{$key} = $value unless defined $hash{$key} && $hash{$key}>=$value;
}
will work. The last line conditionally updates the hash table, which is something you can't do (or at least can't do very conveniently) inside a map statement.
If there's any chance you can perform this processing as the file is read, then I'd do it. Something like this:
my %year_count;
while (my $line = <$fh>){
chomp $line;
my ($year, $num) = split /\|/, $line;
if ($num > $year_count{$year} || !defined $year_count{$year})
$year_count{$year} = $num;
}
}
if you want to use an array, map isn't really the best choice (since you're not transforming the list, you're processing it down to something different). To be honest the most sensible array-processing would probably be the same as the above, but in a foreach instead:
my %year_count;
foreach my $line (#info){
my ($year, $num) = split /\|/, $line;
if ($num > $year_count{$year} || !defined $year_count{$year})
$year_count{$year} = $num;
}
}