Remove duplicates from a 2D array in perl - perl

I have a 2D array in perl whose data is coming as rows in html format from a DB like the data shown below:
<tr><td>Rafa</td><td>Nadal</td><td>Data1</td></tr>,
<tr><td>Goran</td><td>Ivan</td><td>Data2</td></tr>,
<tr><td>Leander</td><td>Paes</td><td>Data2</td></tr>,
<tr><td>Leander</td><td>Paes</td><td>Data2</td></tr>
i want to remove the duplicate rows from the array.
"<tr><td>Leander</td><td>Paes</td><td>Data2</td></tr>" should be removed in above case.
I tried the below piece of code, but it's not working out.
sub unique {
my %seen;
grep ! $seen{ join $;, #$_ }++, #_
}

First: you really should try not to use outdated Perl syntax and side effects.
Second: the answer depends on the data structure you generate from the input. Here are two example implementations:
#!/usr/bin/perl
use strict;
use warnings;
# 2D Array: list of array references
my #data = (
['Rafa', 'Nadal', 'Data1'],
['Goran', 'Ivan', 'Data2'],
['Leander', 'Paes', 'Data2'],
['Leander', 'Paes', 'Data2'],
);
my %seen;
foreach my $unique (
grep {
not $seen{
join('', #{ $_ })
}++
} #data
) {
print join(',', #{ $unique }), "\n";
}
print "\n";
# List of "objects", keys are table column names
#data = (
{ first => 'Rafa', last => 'Nadal', data => 'Data1' },
{ first => 'Goran', last => 'Ivan', data => 'Data2' },
{ first => 'Leander', last => 'Paes', data => 'Data2' },
{ first => 'Leander', last => 'Paes', data => 'Data2' },
);
%seen = ();
my #key_order = qw(first last data);
foreach my $unique (
grep {
not $seen{
join('', #{ $_ }{ #key_order } )
}++
} #data
) {
print join(',', #{ $unique }{ #key_order }), "\n";
}
Output:
$ perl dummy.pl
Rafa,Nadal,Data1
Goran,Ivan,Data2
Leander,Paes,Data2
Rafa,Nadal,Data1
Goran,Ivan,Data2
Leander,Paes,Data2

The shown sub is good for the job, with an array which for elements has array references. That is indeed a basic way to organize 2D data, where your rows are arrayrefs.
There are modules that can be leveraged for this, but this good old method works fine as well
use warnings;
use strict;
use Data::Dump qw(dd);
sub uniq_arys {
my %seen;
grep { not $seen{join $;, #$_}++ } #_;
}
my #data = (
[ qw(one two three) ],
[ qw(ten eleven twelve) ],
[ qw(10 11 12) ],
[ qw(ten eleven twelve) ],
);
my #data_uniq = uniq_arys(#data);
dd \#data_uniq;
Prints as expected (last row is gone), using Data::Dump to show data.
The sub works by joining each array into a string, and those are then checked for duplicates using a hash. The $; is a subscript separator, and an empty string '' is just fine instead.
This approach creates a lot of ancillary data -- in principle doubles the data -- and if performance becomes a problem it may be better to simply compare element-wise (at the cost of complexity). This can be an issue only with rather large data sets.
A module example: use uniq_by from List::UtilsBy
use List::UtilsBy qw(uniq_by);
my #no_dupes = uniq_by { join '', #$_ } #data;
This does, more or less, the same as the sub above.

Related

Manipulate and access the contents of a hash of hashes

I am having trouble with writing a Perl script.
This is the task:
My code works fine but has two issues.
I want to add an element to the hash %grocery, which contains category, brand and price. When adding the item, first the system will ask for the category.
If the category does not exist then it will add a new category, brand and price from the user, but if the category already exists then it will take the brand name and price from the user and append it to the existing category.
When I try to do so it erases the preexisting items. I want the previous items appended with the newly added item.
This issue is with the max value. To find the maximum price in the given hash. I am getting garbage value for that.
What am I doing wrong?
Here is my full code:
use strict;
use warnings;
use List::Util qw(max);
use feature "switch";
my $b;
my $c;
my $p;
my $highest;
print "____________________________STORE THE ITEM_____________________\n";
my %grocery = (
"soap" => { "lux" => 13.00, "enriche" => 11.00 },
"detergent" => { "surf" => 18.00 },
"cleaner" => { "domex" => 75.00 }
);
foreach my $c ( keys %grocery ) {
print "\n";
print "$c\n";
foreach my $b ( keys %{ $grocery{$c} } ) {
print "$b:$grocery{$c}{$b}\n";
}
}
my $ch;
do {
print "________________MENU_________________\n";
print "1.ADD ITEM\n";
print "2.SEARCH\n";
print "3.DISPLAY\n";
print "4.FIND THE MAX PRICE\n";
print "5.EXIT\n";
print "enter your choice \n";
$ch = <STDIN>;
chomp( $ch );
given ( $ch ) {
when ( 1 ) {
print "Enter the category you want to add";
$c = <STDIN>;
chomp( $c );
if ( exists( $grocery{$c} ) ) {
print "Enter brand\n";
$b = <STDIN>;
chomp( $b );
print "Enter price\n";
$p = <STDIN>;
chomp( $p );
$grocery{$c} = { $b, $p };
print "\n";
}
else {
print "Enter brand\n";
$b = <STDIN>;
chomp( $b );
print "Enter price\n";
$p = <STDIN>;
chomp( $p );
$grocery{$c} = { $b, $p };
print "\n";
}
}
when ( 2 ) {
print "Enter the item that you want to search\n";
$c = <STDIN>;
chomp( $c );
if ( exists( $grocery{$c} ) ) {
print "category $c exists\n\n";
print "Enter brand\n";
$b = <STDIN>;
chomp( $b );
if ( exists( $grocery{$c}{$b} ) ) {
print "brand $b of category $c exists\n\n";
print "-----$c-----\n";
print "$b: $grocery{$c}{$b}\n";
}
else {
print "brand $b does not exists\n";
}
}
else {
print "category $c does not exists\n";
}
}
when ( 3 ) {
foreach $c ( keys %grocery ) {
print "$c:\n";
foreach $b ( keys %{ $grocery{$c} } ) {
print "$b:$grocery{$c}{$b}\n";
}
}
}
when ( 4 ) {
print "\n________________PRINT HIGHEST PRICED PRODUCT____________________\n";
$highest = max values %grocery;
print "$highest\n";
}
}
} while ( $ch != 5 );
When I try to do so it erases the preexisting items. I want the previous items appended with the newly added item.
In this line you are overwriting the value of $grocery{$c} with a new hash reference.
$grocery{$c}={$b,$p};
Instead, you need to edit the existing hash reference.
$grocery{$c}->{$b} = $p;
That will add a new key $b to the existing data structure inside of $grocery{$b} and assign it the value of $p.
Let's take a look at what that means. I've added this to the code after %grocery gets initialized.
use Data::Dumper;
print Dumper \%grocery;
We will get the following output. Hashes are not sorted, so the order might be different for you.
$VAR1 = {
'cleaner' => {
'domex' => '75'
},
'detergent' => {
'surf' => '18'
},
'soap' => {
'enriche' => '11',
'lux' => '13'
}
};
As you can see we have hashes inside of hashes. In Perl, references are used to construct a multi level data structure. You can see that from the curly braces {} in the output. The very first one after $VAR1 is because I passed a reference of $grocery to Dumper by adding the backslash \ in front.
So behind the value for $grocery{"cleaner"} is a hash reference { "domex" => 75 }. To reach into that hash reference, you need to use the dereferencing operator ->. You can then put a new key into that hash ref like I showed above.
# ##!!!!!!!!!!
$grocery{"cleaner"}->{"foobar"} = 30;
I've marked the relevant parts above with a comment. You can read up on this stuff in these documents: perlreftut, perllol, perldsc and perlref.
This issue is with the max value. To find the max of the values of the given hash. I am getting garbage value for that.
This problem is also based on the fact that you don't yet understand references.
$highest = max values %grocery;
Your code will only take the values directly inside %grocery. If you scroll up and look at the Dumper output again, you'll see that there are three hash references inside of %grocery. Now if you do not dereference them, you just get their scalar representation. A scalar in Perl is a single value, like a number or a string. But for references it is their type and address. What looks like garbage is in fact the memory address of the three hash references in %grocery which has the highest number.
Of course that's not what you want. You need to iterate both levels of your data structure, collect all values and then find the highest one.
my #all_prices;
foreach my $category (keys %grocery) {
push #all_prices, values %{ $grocery{$category} };
}
$highest = max #all_prices;
print "$highest\n";
I chose a very verbose approach to do that. It iterates over all categories in %grocery and then grabs all the values of the hash reference stored behind each of them. Those get added to an array, and in the end we can take the max of all of them from the array.
You have the exact same code for the when a category already exists and when it does not. The line
$grocery{$c} = { $b, $p };
replaces the entire hash for category $c. That's fine for new categories, but if the category is already there then it will throw away any existing information
You need to write
$grocery{$c}{$b} = $p;
And please add a lot more whitespace around operators, separating the elements of lists, and delineating related sequences of statements
With regard to finding the maximum price, your line
$highest = max values %grocery;
is trying to calculate the maximum of the hash references corresponding to the categories
Since there are two levels of hash here, you need
$highest = max map { values %$_ } values %grocery;
but that may not be the way you're expected to do it. If in doubt then you should use two nested for loops
use List::Util qw(max);
use Data::Dumper;
my $grocery =
{
"soap" => { "lux" => 13.00, "enriche" => 11.00 },
"detergent" => { "surf" => 18.00 },
"cleaner" => { "domex"=> 75.00 }
};
display("unadulterated list");
print Dumper $grocery;
display("new silky soap");
$grocery->{"soap"}->{"silky"} = 12.50;
print Dumper $grocery;
display("new mega cleaner");
$grocery->{"cleaner"}->{"megaclean"} = 99.99;
print Dumper $grocery;
display("new exfoliant soap");
$grocery->{"soap"}->{"exfoliant"} = 23.75;
print Dumper $grocery;
display("lux soap gets discounted");
$grocery->{"soap"}->{"lux"} = 9.00;
print Dumper $grocery;
display("domex cleaner is discontinued");
delete $grocery->{"cleaner"}->{"domex"};
print Dumper $grocery;
display("most costly soap product");
my $max = max values $grocery->{soap};
print $max, "\n\n";
sub display
{
printf("\n%s\n%s\n%s\n\n", '-' x 45, shift, '-' x 45 );
}

Recursive sorting in Perl

I have a hash that contains keys that correspond to database subscripts, but the database can have multidimensional records so the key could be a single subscript, or a list of subscripts.
I need to find a way to sort these records so I can print them in a logical order.
Example:
my $data = {
'1,1,1' => 'data1',
'1,2' => 'data2',
'1,1,3' => 'stuff',
'2,1,1' => 'data3',
'2,1,2' => 'data4',
'2,1,3' => 'data blah',
'2,2,2' => 'datawk2n',
'3,1,2' => 'more',
};
# Should print the keys in the properly sorted order
print join "\n", sort some_function keys %$data;
sub some_function {
# Do some sorting magikz
}
I want it to sort by the leftmost subscript first. If the leftmost value is identical I want to move to the next value and compare those. If those are identical I want to continue to the next one ... and so on ... until all possibilities are exhausted.
This will most likely involve some recursion, but I can't figure out how to make recursion work with those fancy $a and $b variables.
What can I put in some_function to get the following output?
1,1,1
1,1,3
1,2
2,1,1
2,1,2
2,1,3
2,2,2
3,1,2
The following is the fastest solution (by far!):
my #sorted_keys =
map { join ',', unpack 'N*', $_ }
sort
map { pack 'N*', split /,/, $_ }
keys(%$data);
If you want something simpler, and still quite fast, you could use a "natural sort".
Sort::Key::Natural
use Sort::Key::Natural qw( natsort );
my #sorted_keys = natsort(keys(%$data));
Sort::Naturally
use Sort::Naturally qw( nsort );
my #sorted_keys = nsort(keys(%$data));
Benchmarks:
Rate SN SKN grt
SN 3769/s -- -40% -88%
SKN 6300/s 67% -- -79%
grt 30362/s 705% 382% --
Benchmark code:
use strict;
use warnings;
use Benchmark qw( cmpthese );
use List::Util qw( shuffle );
use Sort::Key::Natural qw( );
use Sort::Naturally qw( );
my #keys =
shuffle
split ' ',
'1 1,0 1,1 1,1,1 1,1,3 1,2 2,1,1 2,1,2 2,1,3 2,2,2 3,1,2 10,1,1';
sub grt {
my #sorted_keys =
map { join ',', unpack 'N*', $_ }
sort
map { pack 'N*', split /,/, $_ }
#keys;
}
sub SKN { my #sorted_keys = Sort::Key::Natural::natsort(#keys); }
sub SN { my #sorted_keys = Sort::Naturally::nsort(#keys); }
cmpthese(-3, {
grt => \&grt,
SKN => \&SKN,
SN => \&SN,
});
I thought the Sort::Naturally module would help you here, but it seems not
I must have had a bug in my test. This works fine
use Sort::Naturally 'nsort';
say for nsort keys %$data;
I recommend either this or the Sort::Key::Naturally solution as they are the clearest
It is bad practice to chase speed of execution, especially at the expense of readability, before there is evidence that a given solution is too slow. Even then it is foolish to randomly optimise chunks of your code in the hope of making a difference, and your solution should be run through a profiler to discover where it would be most fruitful to make enhancements
There is no need for recursion. This program shows a sort subroutine by_elements which simply compares each item in the list until it finds either a mismatch or the end of one of the lists
In the former case the result is just the comparison of the two differ elements, and in the latter it is a comparison of the number of elements in the two lists
use strict;
use warnings 'all';
use feature 'say';
my $data = {
'1,1,1' => 'data1',
'1,2' => 'data2',
'1,1,3' => 'stuff',
'2,1,1' => 'data3',
'2,1,2' => 'data4',
'2,1,3' => 'data blah',
'2,2,2' => 'datawk2n',
'3,1,2' => 'more',
'10,1,1' => 'odd',
'1,1' => 'simple',
'1,0' => 'simple0',
'1' => 'simpler',
};
say for sort by_elements keys %$data;
sub by_elements {
my ( $aa, $bb ) = map [/\d+/g], $a, $b;
for ( my $i = 0; $i < #$aa and $i < #$bb; ++$i ) {
my $cmp = $aa->[$i] <=> $bb->[$i];
return $cmp if $cmp;
}
return #$aa <=> #$bb;
}
output
1
1,0
1,1
1,1,1
1,1,3
1,2
2,1,1
2,1,2
2,1,3
2,2,2
3,1,2
10,1,1
Use natsort of Sort::Key::Natural:
#!/usr/bin/env perl
use strict;
use warnings;
use v5.10;
use Sort::Key::Natural qw(natsort);
my %data = (
'1,1,1' => 'data1',
'1,2' => 'data2',
'1,1,3' => 'stuff',
'2,1,1' => 'data3',
'2,1,2' => 'data4',
'2,1,3' => 'data blah',
'2,2,2' => 'datawk2n',
'10,1,2' => 'more',
);
say for natsort keys %data;
Outputs:
1,1,1
1,1,3
1,2
2,1,1
2,1,2
2,1,3
2,2,2
10,1,2
No need for recursion, just a loop that you can break out of.
sub some_function {
my #aa = split /,/, $a;
my #bb = split /,/, $b;
my $cmp = 0;
for (my $i=0; $i<#aa || $i<#bb; $i++) {
$cmp = $aa[$i] <=> $bb[$i];
last if $cmp;
}
$cmp;
}
But if your heart is set on a recursive solution, there's
sub aref_sort_recurse {
my ($c,$d) = #_;
#$c ? #$d ? shift #$c <=> shift #$d || aref_sort_recurse($c,$d) ? 1 : -#$d
}
sub some_function {
aref_sort_recurse( [split /,/, $a], [split /,/, $b] )
}

Maintain order within a hash of hashes, and output as .csv

Here's my code:
my %hash = (
'2564' => {
'st_responsible' => 'mname1',
'critical' => '',
'last_modified_by' => 'teamname1',
'transstatus' => '',
'rt_res' => 'pname1'
},
'2487' => {
'st_responsible' => 'mname2',
'critical' => '',
'last_modified_by' => 'teamname2',
'transstatus' => '',
'rt_res' => ''
}
);
print "xnum,st_responsible,critical,last_modified_by,transstatus,rt_res\n";
foreach my $x_number (sort keys %hash)
{
print "$x_number";
foreach my $element (keys %{$hash{$x_number}})
{
print ",$hash{$x_number}{$element}";
}
print "\n";
}
Expected output
xnum,st_responsible,critical,last_modified_by,transstatus,rt_res
2487,mname2,,teamname2,,
2564,mname1,,teamname1,,pname1
Actual output
xnum,st_responsible,critical,last_modified_by,transstatus,rt_res
2487,mname2,,,teamname2,
2564,mname1,,,teamname1,pname1
Please help in letting me know as to how exactly do I preserve the order of this data structure, and then write this to a CSV file.
I would suggest that for this, you'd be better off doing this with a slice, which is a way of extracting a list of values from a hash in a particular order?
#configure field order
my #order = qw ( st_responsible critical last_modified_by transstatus rt_res );
#print header row
print join (",", "xnum", #order ),"\n";
#iterate the rows
foreach my $key ( sort keys %hash ) {
#extract hash slice and join it with commas
print join ( ",", $key, #{$hash{$key}}{#order} ),"\n";
}
This gives:
xnum,st_responsible,critical,last_modified_by,transstatus,rt_res
2487,mname2,,teamname2,,
2564,mname1,,teamname1,,pname1
You can consider Text::CSV - but I'd suggest in this scenario it's overkill, best used when you've got quotes and quoted field separators to worry about. (And you don't).
If you have to deal with not just empty keys, but missing ones, you can make use of map:
my #order = qw ( st_responsible critical last_modified_by
transstatus missing rt_res extra_field_here );
print join (",", "xnum", #order ),"\n";
foreach my $key ( sort keys %hash ) {
print join ( ",", $key, map { $_ // '' } #{$hash{$key}}{#order} ),"\n";
}
(Otherwise you'll get a warning about an undef value).
Perl doesn't guarantee the order of items in the hash, this is the root cause of the issue. Even two different hashes with the same keys can have different order of keys. It may also differ from platform to platform and architecture and perl version.
You need to define another array with the list of keys which you want to print in correct order.
my #keys = qw(st_responsible critical last_modified_by transstatus rt_res);
foreach my $element (#keys) {
... print the value
}
EDIT: As you're trying to write CSV file, consider using Text::CSV which takes care about special characters, correct formatting and things like that.
There's probably a slicker way of achieving this, but give this a go:
use warnings;
use strict;
open my $csv_out, '>', 'out.csv' or die $!;
my #keys = qw(2487 2564);
my #vals = qw(st_responsible critical last_modified_by transstatus rt_res);
print $csv_out "xnum,st_responsible,critical,last_modified_by,transstatus,rt_res\n";
for my $key (#keys){
print $csv_out "$key,";
for my $vals (#vals){
$vals eq $vals[-1] ? print $csv_out "$hash{$key}{$vals}\n" : print $csv_out "$hash{$key}{$vals},";
}
}
This will print out comma-separated values to a csv file out.csv maintaining your original order (by iterating over arrays). If it's the last value it will print a newline.
--- OUTPUT ---
xnum,st_responsible,critical,last_modified_by,transstatus,rt_res
2487,mname2,,teamname2,,
2564,mname1,,teamname1,,pname1

Build hash of hash in perl

I'm new to using perl and I'm trying to build a hash of a hash from a tsv. My current process is to read in a file and construct a hash and then insert it into another hash.
my %hoh = ();
while (my $line = <$tsv>)
{
chomp $line;
my %hash;
my #data = split "\t", $line;
my $id;
my $iter = each_array(#columns, #data);
while(my($k, $v) = $iter->())
{
$hash{$k} = $v;
if($k eq 'Id')
{
$id = $v;
}
}
$hoh{$id} = %hash;
}
print "dump: ", Dumper(%hoh);
This outputs:
dump
$VAR1 = '1234567890';
$VAR2 = '17/32';
$VAR3 = '1234567891';
$VAR4 = '17/32';
.....
Instead of what I would expect:
dump
{
'1234567890' => {
'k1' => 'v1',
'k2' => 'v2',
'k3' => 'v3',
'k4' => 'v4',
'id' => '1234567890'
},
'1234567891' => {
'k1' => 'v1',
'k2' => 'v2',
'k3' => 'v3',
'k4' => 'v4',
'id' => '1234567891'
},
........
};
My limited understanding is that when I do $hoh{$id} = %hash; its inserting in a reference to %hash? What am I doing wrong? Also is there a more succint way to use my columns and data array's as key,value pairs into my %hash object?
-Thanks in advance,
Niru
To get a reference, you have to use \:
$hoh{$id} = \%hash;
%hash is the hash, not the reference to it. In scalar context, it returns the string X/Y wre X is the number of used buckets and Y the number of all the buckets in the hash (i.e. nothing useful).
To get a reference to a hash variable, you need to use \%hash (as choroba said).
A more succinct way to assign values to columns is to assign to a hash slice, like this:
my %hoh = ();
while (my $line = <$tsv>)
{
chomp $line;
my %hash;
#hash{#columns} = split "\t", $line;
$hoh{$hash{Id}} = \%hash;
}
print "dump: ", Dumper(\%hoh);
A hash slice (#hash{#columns}) means essentially the same thing as ($hash{$columns[0]}, $hash{$columns[1]}, $hash{$columns[2]}, ...) up to however many columns you have. By assigning to it, I'm assigning the first value from split to $hash{$columns[0]}, the second value to $hash{$columns[1]}, and so on. It does exactly the same thing as your while ... $iter loop, just without the explicit loop (and it doesn't extract the $id).
There's no need to compare each $k to 'Id' inside a loop; just store it in the hash as a normal field and extract it afterwards with $hash{Id}. (Aside: Is your column header Id or id? You use Id in your loop, but id in your expected output.)
If you don't want to keep the Id field in the individual entries, you could use delete (which removes the key from the hash and returns the value):
$hoh{delete $hash{Id}} = \%hash;
Take a look at the documentation included in Perl. The command perldoc is very helpful. You can also look at the Perldoc webpage too.
One of the tutorials is a tutorial on Perl references. It all help clarify a lot of your questions and explain about referencing and dereferencing.
I also recommend that you look at CPAN. This is an archive of various Perl modules that can do many various tasks. Look at Text::CSV. This module will do exactly what you want, and even though it says "CSV", it works with tab separated files too.
You missed putting a slash in front of your hash you're trying to make a reference. You have:
$hoh{$id} = %hash;
Probably want:
$hoh{$id} = \%hash;
also, when you do a Data::Dumper of a hash, you should do it on a reference to a hash. Internally, hashes and arrays have similar structures when a Data::Dumper dump is done.
You have:
print "dump: ", Dumper(%hoh);
You should have:
print "dump: ", Dumper( \%hoh );
My attempt at the program:
#! /usr/bin/env perl
#
use warnings;
use strict;
use autodie;
use feature qw(say);
use Data::Dumper;
use constant {
FILE => "test.txt",
};
open my $fh, "<", FILE;
#
# First line with headers
#
my $line = <$fh>;
chomp $line;
my #headers = split /\t/, $line;
my %hash_of_hashes;
#
# Rest of file
#
while ( my $line = <$fh> ) {
chomp $line;
my %line_hash;
my #values = split /\t/, $line;
for my $index ( ( 0..$#values ) ) {
$line_hash{ $headers[$index] } = $values[ $index ];
}
$hash_of_hashes{ $line_hash{id} } = \%line_hash;
}
say Dumper \%hash_of_hashes;
You should only store a reference to a variable if you do so in the last line before the variable goes go of scope. In your script, you declare %hash inside the while loop, so placing this statement as the last in the loop is safe:
$hoh{$id} = \%hash;
If it's not the last statement (or you're not sure it's safe), create an anonymous structure to hold the contents of the variable:
$hoh{$id} = { %hash };
This makes a copy of %hash, which is slower, but any subsequent changes to it will not effect what you stored.

sum hash of hash values using perl

I have a Perl script that parses an Excel file and does the following : It counts for each value in column A, the number of elements it has in column B, the script looks like this :
use strict;
use warnings;
use Spreadsheet::XLSX;
use Data::Dumper;
use List::Util qw( sum );
my $col1 = 0;
my %hash;
my $excel = Spreadsheet::XLSX->new('inout_chartdata_ronald.xlsx');
my $sheet = ${ $excel->{Worksheet} }[0];
$sheet->{MaxRow} ||= $sheet->{MinRow};
my $count = 0;
# Iterate through each row
foreach my $row ( $sheet->{MinRow}+1 .. $sheet->{MaxRow} ) {
# The cell in column 1
my $cell = $sheet->{Cells}[$row][$col1];
if ($cell) {
# The adjacent cell in column 2
my $adjacentCell = $sheet->{Cells}[$row][ $col1 + 1 ];
# Use a hash of hashes
$hash{ $cell->{Val} }{ $adjacentCell->{Val} }++;
}
}
print "\n", Dumper \%hash;
The output looks like this :
$VAR1 = {
'13' => {
'klm' => 1,
'hij' => 2,
'lkm' => 4,
},
'12' => {
'abc' => 2,
'efg' => 2
}
};
This works great, my question is : How can I access the elements of this output $VAR1 in order to do : for value 13, klm + hij = 3 and get a final output like this :
$VAR1 = {
'13' => {
'somename' => 3,
'lkm' => 4,
},
'12' => {
'abc' => 2,
'efg' => 2
}
};
So basically what I want to do is loop through my final hash of hashes and access its specific elements based on a unique key and finally do their sum.
Any help would be appreciated.
Thanks
I used #do_sum to indicate what changes you want to make. The new key is hardcoded in the script. Note that the new key is not created if no key exists in the subhash (the $found flag).
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
my %hash = (
'13' => {
'klm' => 1,
'hij' => 2,
'lkm' => 4,
},
'12' => {
'abc' => 2,
'efg' => 2
}
);
my #do_sum = qw(klm hij);
for my $num (keys %hash) {
my $found;
my $sum = 0;
for my $key (#do_sum) {
next unless exists $hash{$num}{$key};
$sum += $hash{$num}{$key};
delete $hash{$num}{$key};
$found = 1;
}
$hash{$num}{somename} = $sum if $found;
}
print Dumper \%hash;
It sounds like you need to learn about Perl References, and maybe Perl Objects which are just a nice way to deal with references.
As you know, Perl has three basic data-structures:
Scalars ($foo)
Arrays (#foo)
Hashes (%foo)
The problem is that these data structures can only contain scalar data. That is, each element in an array can hold a single value or each key in a hash can hold a single value.
In your case %hash is a Hash where each entry in the hash references another hash. For example:
Your %hash has an entry in it with a key of 13. This doesn't contain a scalar value, but a references to another hash with three keys in it: klm, hij, and lkm. YOu can reference this via this syntax:
${ hash{13} }{klm} = 1
${ hash{13} }{hij} = 2
${ hash{13} }{lkm} = 4
The curly braces may or may not be necessary. However, %{ hash{13} } references that hash contained in $hash{13}, so I can now reference the keys of that hash. You can imagine this getting more complex as you talk about hashes of hashes of arrays of hashes of arrays. Fortunately, Perl includes an easier syntax:
$hash{13}->{klm} = 1
%hash{13}->{hij} = 2
%hash{13}->{lkm} = 4
Read up about hashes and how to manipulate them. After you get comfortable with this, you can start working on learning about Object Oriented Perl which handles references in a safer manner.