accessing "pair hash" references used for column specification

accessing "pair hash" references used for column specification - perl

I'm building a small library that transforms log data into a CSV-like file that can be imported by spreadsheet software. For the output, I'm interested in an option to display human-friendly captions for the table columns if needed. This should be an option so that the tool can also be used with minimal effort. I came up with an array for column specification that contains plain scalars for keys or hash references with a pair of key and value. I'm accessing these via keys and values which looks a bit strange to me.
Is there a simpler way to access key and value of a hash that contains just a pair?
(I tried to simplify the code as much as possible.)
#!/usr/bin/perl -w
use strict;
use warnings;
# some sample data
my #rows = (
{x => 1, y => 2},
{y => 5, z => 6},
{x => 7, z => 9},
);
sub print_table {
my #columns = #_; # columns of interest with optional header replacement
my #keys; # for accessing the data values
my #captions; # for display on column headers
for (#columns) {
if (ref($_) eq 'HASH') {
push #keys, keys %$_;
push #captions, values %$_;
} else {
push #keys, $_;
push #captions, $_;
}
}
print join ("\t", #captions), "\n";
for my $row (#rows) {
print join ("\t", (map {$row->{$_} // ''} #keys)), "\n";
}
}
print_table({x=>'u'}, 'y');

All you need:
my ($k, $v) = %hash;
So
my ($k, $v) = %$_;
push #keys, $k;
push #captions, $v;
or
push #keys, ( %$_ )[0];
push #captions, ( %$_ )[1];

Use each.
From perldoc:
When called on a hash in list context, returns a 2-element list
consisting of the key and value for the next element of a hash.
But don't forget to reset the iterator using keys(%hash) afterwards or subsequent each will fail.
my ($k, $v) = each(%$_);
keys(%$_);

Related

unique value in hash while merging their keys

I have a file with tab separated columns like this:
TR1"\t"P0C134
TR2"\t"P0C133
TR2"\t"P0C136
Now I split these into two arrays (one for each column values) then convert them into hashes but I want to remove the duplicates (here its TR2) while merging their right column values...something like this TR2=>P0C133,P0C136...how is it possible?? is there any function to do it in perl??
for($i=0;$i<=scalar#s_arr;$i++)
{
if($s_arr[$i] eq $s_arr[$i+1])
{ push(#temp,$idx_arr[$i]); }
else
{
if(#temp eq "")
{ $s_hash{$s_arr[$i]}=$idx_arr[$i]; }
else
{
$idx_str=join(",",#temp);
$s_hash{$s_arr[$i]}=$idx_str;
#temp="";
}
}
}
this is code I've written where #s_arr is storing left column values and #idx_arr is storing right column value

You can avoid using two arrays and perform what you want in one fell swoop treating the left-side value as the hash key and making it an array reference, then pushing the right-side values that correlate with that key onto that aref:
use warnings;
use strict;
use Data::Dumper;
my %hash;
while (<DATA>){
my ($key, $val) = split;
push #{ $hash{$key} }, $val;
}
print Dumper \%hash;
__DATA__
TR1 P0C134
TR2 P0C133
TR2 P0C136
Output:
$VAR1 = {
'TR1' => [
'P0C134'
],
'TR2' => [
'P0C133',
'P0C136'
]
};

If you want that same structure output use hash of hash.
#!/usr/bin/perl
use warnings;
use strict;
my #arr = <DATA>;
my %hash;
foreach (#arr)
{
my ($k,$v) = split(/\s+/,$_);
chomp $v;
$hash{$k}{$v}++;
}
foreach my $key1 (keys %hash)
{
print "$key1=>";
foreach my $key2 (keys $hash{$key1})
{
print "$key2,";
}
print "\n";
}
__DATA__
TR1 P0C134
TR2 P0C133
TR2 P0C136
Output is:
TR2=>P0C136,P0C133,
TR1=>P0C134,

Maintain order within a hash of hashes, and output as .csv

Here's my code:
my %hash = (
'2564' => {
'st_responsible' => 'mname1',
'critical' => '',
'last_modified_by' => 'teamname1',
'transstatus' => '',
'rt_res' => 'pname1'
},
'2487' => {
'st_responsible' => 'mname2',
'critical' => '',
'last_modified_by' => 'teamname2',
'transstatus' => '',
'rt_res' => ''
}
);
print "xnum,st_responsible,critical,last_modified_by,transstatus,rt_res\n";
foreach my $x_number (sort keys %hash)
{
print "$x_number";
foreach my $element (keys %{$hash{$x_number}})
{
print ",$hash{$x_number}{$element}";
}
print "\n";
}
Expected output
xnum,st_responsible,critical,last_modified_by,transstatus,rt_res
2487,mname2,,teamname2,,
2564,mname1,,teamname1,,pname1
Actual output
xnum,st_responsible,critical,last_modified_by,transstatus,rt_res
2487,mname2,,,teamname2,
2564,mname1,,,teamname1,pname1
Please help in letting me know as to how exactly do I preserve the order of this data structure, and then write this to a CSV file.

I would suggest that for this, you'd be better off doing this with a slice, which is a way of extracting a list of values from a hash in a particular order?
#configure field order
my #order = qw ( st_responsible critical last_modified_by transstatus rt_res );
#print header row
print join (",", "xnum", #order ),"\n";
#iterate the rows
foreach my $key ( sort keys %hash ) {
#extract hash slice and join it with commas
print join ( ",", $key, #{$hash{$key}}{#order} ),"\n";
}
This gives:
xnum,st_responsible,critical,last_modified_by,transstatus,rt_res
2487,mname2,,teamname2,,
2564,mname1,,teamname1,,pname1
You can consider Text::CSV - but I'd suggest in this scenario it's overkill, best used when you've got quotes and quoted field separators to worry about. (And you don't).
If you have to deal with not just empty keys, but missing ones, you can make use of map:
my #order = qw ( st_responsible critical last_modified_by
transstatus missing rt_res extra_field_here );
print join (",", "xnum", #order ),"\n";
foreach my $key ( sort keys %hash ) {
print join ( ",", $key, map { $_ // '' } #{$hash{$key}}{#order} ),"\n";
}
(Otherwise you'll get a warning about an undef value).

Perl doesn't guarantee the order of items in the hash, this is the root cause of the issue. Even two different hashes with the same keys can have different order of keys. It may also differ from platform to platform and architecture and perl version.
You need to define another array with the list of keys which you want to print in correct order.
my #keys = qw(st_responsible critical last_modified_by transstatus rt_res);
foreach my $element (#keys) {
... print the value
}
EDIT: As you're trying to write CSV file, consider using Text::CSV which takes care about special characters, correct formatting and things like that.

There's probably a slicker way of achieving this, but give this a go:
use warnings;
use strict;
open my $csv_out, '>', 'out.csv' or die $!;
my #keys = qw(2487 2564);
my #vals = qw(st_responsible critical last_modified_by transstatus rt_res);
print $csv_out "xnum,st_responsible,critical,last_modified_by,transstatus,rt_res\n";
for my $key (#keys){
print $csv_out "$key,";
for my $vals (#vals){
$vals eq $vals[-1] ? print $csv_out "$hash{$key}{$vals}\n" : print $csv_out "$hash{$key}{$vals},";
}
}
This will print out comma-separated values to a csv file out.csv maintaining your original order (by iterating over arrays). If it's the last value it will print a newline.
--- OUTPUT ---
xnum,st_responsible,critical,last_modified_by,transstatus,rt_res
2487,mname2,,teamname2,,
2564,mname1,,teamname1,,pname1

Perl list all keys in hash with identical values

If I have a colon-delimited file name FILE and I do:
cat FILE|perl -F: -lane 'my %hash = (); $hash{#F[0]} = #F[2]'
to assign the first and 3rd tokens as the key => value pairs for the hash..
1) Is that a sane way to assign key value pairs to a hash?
2) What is the simplest way to now find all keys with shared values and list them?
Assume FILE looks like:
Mike:34:Apple:Male
Don:23:Corn:Male
Jared:12:Apple:Male
Beth:56:Maize:Female
Sam:34:Apple:Male
David:34:Apple:Male
Desired Output: Keys with value "Apple": Mike,Jared,David,Sam

Your example won't work as you want because the -n option puts a while loop around your one-line program, so the hash you declare is created and destoyed for every record in the file. You could get around that by not declaring the hash, and so making it a persistent package variable which will retain all values stored in it.
You can then write push #{ $hash{$F[2]} }, $F[0] but notice that it should be $F[0] etc. and not #F[0], and I have used push to create a list of column 1 values for each column 3 value instead of just a list of one-to-one values relating each column 1 value with its column 3 value.
To clarify, your method produces a hash looking like this, which has to be searched to produce the display that you want.
(
Beth => "Maize",
David => "Apple",
Don => "Corn",
Jared => "Apple",
Mike => "Apple",
Sam => "Apple",
)
while mine creates this, which as you can see is pretty much already in the form you want.
(
Apple => ["Mike", "Jared", "Sam", "David"],
Corn => ["Don"],
Maize => ["Beth"],
)
But I think this problem is a bit too big to be solved with a one-line Perl program. The solution below expects the path to the input file as a command-line parameter, like this
> perl prog.pl colons.csv
but it will default to myfile.csv if no file is specified.
use strict;
use warnings;
our #ARGV = 'myfile.csv' unless #ARGV;
my %data;
while (<>) {
my #fields = split /:/;
push #{ $data{$fields[2]} }, $fields[0];
}
while (my ($k, $v) = each %data) {
next unless #$v > 1;
printf qq{Keys with value "%s": %s\n}, $k, join ', ', #$v;
}
output
Keys with value "Apple": Mike, Jared, Sam, David

use strict;
use warnings;
open my $in, '<', 'in.txt';
my %data;
while(<$in>){
chomp;
my #split = split/:/;
$data{$split[0]} = $split[2];
}
my $query = 'Apple';
print "Keys with value $query = ";
foreach my $name (keys %data){
print "$name " if $data{$name} eq $query;
}
print "\n";

Arrays are used to hold list of values, so use an array.
perl -F: -lane'
push #{ $h{$F[2]} }, $F[0];
END {
for my $fruit (keys %h) {
next if #{ $h{$fruit} } < 2;
print "$fruit: ", join(",", #{ $h{$fruit} });
}
}
' FILE
The END block is executed on exit. In it, we iterate over the keys of the hash. If the value of the current hash element is an array with only one element, it's skipped. Otherwise, we prints the key followed by contents of the array referenced by the hash element.

Here is another way:
perl -F: -lane'
push #{ $h{$F[2]} }, $F[0];
}{
print "$_: ", join(",", #{ $h{$_} }) for grep { #{$h{$_}} > 1 } keys %h;
' file
We read each line and create hash of arrays using third column as key and first column as list of values for matching key. In the END block we iterate over our hash using grep and filter keys whose array count greater than 1 and print the key followed by array elements.

It doesn't have to be a one liner,
Good. It's not going to be...
Is that a sane way to assign key value pairs to a hash?
You're simply assigning the key value pairs as:
$hash{"key"} = "value";
Which is about as simple as it gets. There might be a way of doing it via map. However, the main issue I see is what should happen if you have duplicate keys.
Let's say your file looks like this:
Mike:34:Apple:Male
Don:23:Corn:Male
Jared:12:Apple:Male
Beth:56:Maize:Female
Sam:34:Apple:Male
David:34:Apple:Male # Note this entry is here twice!
David:35:Wheat:Male # Note this entry is here twice!
Let's do a simple assignment loop:
my %hash;
while my $line ( <$fh> ) {
chomp $line;
my ($name, $age, $category, $sex) = split /:/, $line;
$hash{$name} = $category;
}
When you get to $hash{David}, it will first be set to Apple, but then you change the value to Wheat. There are four ways you can handle this:
Use whatever the last value is. No change in the loop.
Use the first value and ignore subsequent values. Simple enough to do.
If that happens, it's an error. Abort the program and report the error.
Keep all values.
This last one is the most interesting because it involves a reference to an array as the values for your hash:
my %hash;
while my $line ( <$fh> ) {
chomp $line;
my ($name, $age, $category, $sex) = split /:/, $line;
$hash{$name} = [] if not exists $hash{$name}; # I'm making this an array reference
push #{ $hash{$name} }, $category;
}
Now, each value in my hash is a reference to an array:
my #values = #{ $hash{David} ); # The values of David...
print "David is in categories " . join ( ", ", #values ) . "\n";
This will print out David is in categories Wheat, Apple
What is the simplest way to now find all keys with shared values and list them?
The easiest way is to create a second hash that's keyed by your value. In this hash, you will need to use an array reference. Let's assume no duplicate names for now:
my %hash;
my %indexed_hash;
while my $line ( <$fh> ) {
chomp $line;
my ($name, $age, $category, $sex) = split /:/, $line;
$hash{$name} = $category;
my $indexed_hash{$category} = [] if not exist $indexed_hash{$category};
push #{ $indexed_hash{$category} }, $name;
}
Now, if I want to find all the duplicates of Apple:
my #names = #{ $indexed_hash{Apple} };
print "The following are in 'Apple': " . join ( ", " #names ) . "\n";
Since we're getting into references, we could take things a step further and store all of your values of your file in your hash. Again, for simplicity, I am assuming that you will have one and only one entry per name:
my %hash;
while my $line ( <$fh> ) {
chomp $line;
my ($name, $age, $category, $sex) = split /:/, $line;
$hash{$name}->{AGE} = $age;
$hash{$name}->{CATEGORY} = $category;
$hash{$name}->{SEX} = $sex;
}
for my $name ( sort keys %hash ) {
print "$name Information:\n";
print " Age: " . $hash{$name}->{AGE} . "\n";
printf "Category: %s\n", $hash{$name}->{CATEGORY};
print " Sex: #{[$hash{$name}->{SEX}]}\n\n";
}
That last two statements are easier ways of interpolating complex data structures into a string. The printf is fairly clear. The second #{[...]} is a neat little trick.

What have you tried?
If you reverse the hash into a list of value => key pairs then use List::Util's pairs() against the list, you can transform the hash into a hash of values => key arrayrefs. i.e. ( foo => [ 'bar', 'baz' ] ), grep {#{$hash{$_}} > 1} keys %hash, and print the results.

Build hash of hash in perl

I'm new to using perl and I'm trying to build a hash of a hash from a tsv. My current process is to read in a file and construct a hash and then insert it into another hash.
my %hoh = ();
while (my $line = <$tsv>)
{
chomp $line;
my %hash;
my #data = split "\t", $line;
my $id;
my $iter = each_array(#columns, #data);
while(my($k, $v) = $iter->())
{
$hash{$k} = $v;
if($k eq 'Id')
{
$id = $v;
}
}
$hoh{$id} = %hash;
}
print "dump: ", Dumper(%hoh);
This outputs:
dump
$VAR1 = '1234567890';
$VAR2 = '17/32';
$VAR3 = '1234567891';
$VAR4 = '17/32';
.....
Instead of what I would expect:
dump
{
'1234567890' => {
'k1' => 'v1',
'k2' => 'v2',
'k3' => 'v3',
'k4' => 'v4',
'id' => '1234567890'
},
'1234567891' => {
'k1' => 'v1',
'k2' => 'v2',
'k3' => 'v3',
'k4' => 'v4',
'id' => '1234567891'
},
........
};
My limited understanding is that when I do $hoh{$id} = %hash; its inserting in a reference to %hash? What am I doing wrong? Also is there a more succint way to use my columns and data array's as key,value pairs into my %hash object?
-Thanks in advance,
Niru

To get a reference, you have to use \:
$hoh{$id} = \%hash;
%hash is the hash, not the reference to it. In scalar context, it returns the string X/Y wre X is the number of used buckets and Y the number of all the buckets in the hash (i.e. nothing useful).

To get a reference to a hash variable, you need to use \%hash (as choroba said).
A more succinct way to assign values to columns is to assign to a hash slice, like this:
my %hoh = ();
while (my $line = <$tsv>)
{
chomp $line;
my %hash;
#hash{#columns} = split "\t", $line;
$hoh{$hash{Id}} = \%hash;
}
print "dump: ", Dumper(\%hoh);
A hash slice (#hash{#columns}) means essentially the same thing as ($hash{$columns[0]}, $hash{$columns[1]}, $hash{$columns[2]}, ...) up to however many columns you have. By assigning to it, I'm assigning the first value from split to $hash{$columns[0]}, the second value to $hash{$columns[1]}, and so on. It does exactly the same thing as your while ... $iter loop, just without the explicit loop (and it doesn't extract the $id).
There's no need to compare each $k to 'Id' inside a loop; just store it in the hash as a normal field and extract it afterwards with $hash{Id}. (Aside: Is your column header Id or id? You use Id in your loop, but id in your expected output.)
If you don't want to keep the Id field in the individual entries, you could use delete (which removes the key from the hash and returns the value):
$hoh{delete $hash{Id}} = \%hash;

Take a look at the documentation included in Perl. The command perldoc is very helpful. You can also look at the Perldoc webpage too.
One of the tutorials is a tutorial on Perl references. It all help clarify a lot of your questions and explain about referencing and dereferencing.
I also recommend that you look at CPAN. This is an archive of various Perl modules that can do many various tasks. Look at Text::CSV. This module will do exactly what you want, and even though it says "CSV", it works with tab separated files too.
You missed putting a slash in front of your hash you're trying to make a reference. You have:
$hoh{$id} = %hash;
Probably want:
$hoh{$id} = \%hash;
also, when you do a Data::Dumper of a hash, you should do it on a reference to a hash. Internally, hashes and arrays have similar structures when a Data::Dumper dump is done.
You have:
print "dump: ", Dumper(%hoh);
You should have:
print "dump: ", Dumper( \%hoh );
My attempt at the program:
#! /usr/bin/env perl
#
use warnings;
use strict;
use autodie;
use feature qw(say);
use Data::Dumper;
use constant {
FILE => "test.txt",
};
open my $fh, "<", FILE;
#
# First line with headers
#
my $line = <$fh>;
chomp $line;
my #headers = split /\t/, $line;
my %hash_of_hashes;
#
# Rest of file
#
while ( my $line = <$fh> ) {
chomp $line;
my %line_hash;
my #values = split /\t/, $line;
for my $index ( ( 0..$#values ) ) {
$line_hash{ $headers[$index] } = $values[ $index ];
}
$hash_of_hashes{ $line_hash{id} } = \%line_hash;
}
say Dumper \%hash_of_hashes;

You should only store a reference to a variable if you do so in the last line before the variable goes go of scope. In your script, you declare %hash inside the while loop, so placing this statement as the last in the loop is safe:
$hoh{$id} = \%hash;
If it's not the last statement (or you're not sure it's safe), create an anonymous structure to hold the contents of the variable:
$hoh{$id} = { %hash };
This makes a copy of %hash, which is slower, but any subsequent changes to it will not effect what you stored.

How to copy keys from hash to array without duplicates?

If I have an array and hash like these
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
my #a = qw/a b c d e/;
my %h = (a => 1, b => 1, f => 1, g => 1);
and I would like to end up with #a containing all of the keys from %h, and no element in the array must appear more than once.
How can that be done, as exists doesn't work on arrays?

If you have Perl 5.10 and later, you can use smart matching (~~):
for my $key (keys %h) {
push #a, $key unless $key ~~ #a;
}
Otherwise, List::Util's first can help:
for my $key (keys %h) {
push #a, $key unless first { $_ eq $key } #a;
}

You could make use of List::MoreUtils's uniq function:
use List::MoreUtils qw( uniq );
#a = uniq #a, keys %h;

Convert the values you want into hash keys, then extract them
my %foo = map { $_ => 1 } #a, keys %h;
print sort keys %foo;

How about this (admittedly destructive to `%h):
delete #h{ #a }; # delete all keys of h already in #a
push #a, keys %h; # push remaining keys onto #a
Thus #a retains the order it had and simply appends the non-duplicate keys in %h.
A word about the destructiveness: The example above illustrates some concepts of what can be done when you can afford to be destructive. And delete is certainly no more destructive than passing out of the scope of a lexical variable.
The issue can be addressed by simply copying it to another hash before narrowing the hash to those keys not found in #a.
my %h2 = %h;
delete #h2{ #a };
push #a, keys %h2;