Maintain order within a hash of hashes, and output as .csv - perl

Here's my code:
my %hash = (
'2564' => {
'st_responsible' => 'mname1',
'critical' => '',
'last_modified_by' => 'teamname1',
'transstatus' => '',
'rt_res' => 'pname1'
},
'2487' => {
'st_responsible' => 'mname2',
'critical' => '',
'last_modified_by' => 'teamname2',
'transstatus' => '',
'rt_res' => ''
}
);
print "xnum,st_responsible,critical,last_modified_by,transstatus,rt_res\n";
foreach my $x_number (sort keys %hash)
{
print "$x_number";
foreach my $element (keys %{$hash{$x_number}})
{
print ",$hash{$x_number}{$element}";
}
print "\n";
}
Expected output
xnum,st_responsible,critical,last_modified_by,transstatus,rt_res
2487,mname2,,teamname2,,
2564,mname1,,teamname1,,pname1
Actual output
xnum,st_responsible,critical,last_modified_by,transstatus,rt_res
2487,mname2,,,teamname2,
2564,mname1,,,teamname1,pname1
Please help in letting me know as to how exactly do I preserve the order of this data structure, and then write this to a CSV file.

I would suggest that for this, you'd be better off doing this with a slice, which is a way of extracting a list of values from a hash in a particular order?
#configure field order
my #order = qw ( st_responsible critical last_modified_by transstatus rt_res );
#print header row
print join (",", "xnum", #order ),"\n";
#iterate the rows
foreach my $key ( sort keys %hash ) {
#extract hash slice and join it with commas
print join ( ",", $key, #{$hash{$key}}{#order} ),"\n";
}
This gives:
xnum,st_responsible,critical,last_modified_by,transstatus,rt_res
2487,mname2,,teamname2,,
2564,mname1,,teamname1,,pname1
You can consider Text::CSV - but I'd suggest in this scenario it's overkill, best used when you've got quotes and quoted field separators to worry about. (And you don't).
If you have to deal with not just empty keys, but missing ones, you can make use of map:
my #order = qw ( st_responsible critical last_modified_by
transstatus missing rt_res extra_field_here );
print join (",", "xnum", #order ),"\n";
foreach my $key ( sort keys %hash ) {
print join ( ",", $key, map { $_ // '' } #{$hash{$key}}{#order} ),"\n";
}
(Otherwise you'll get a warning about an undef value).

Perl doesn't guarantee the order of items in the hash, this is the root cause of the issue. Even two different hashes with the same keys can have different order of keys. It may also differ from platform to platform and architecture and perl version.
You need to define another array with the list of keys which you want to print in correct order.
my #keys = qw(st_responsible critical last_modified_by transstatus rt_res);
foreach my $element (#keys) {
... print the value
}
EDIT: As you're trying to write CSV file, consider using Text::CSV which takes care about special characters, correct formatting and things like that.

There's probably a slicker way of achieving this, but give this a go:
use warnings;
use strict;
open my $csv_out, '>', 'out.csv' or die $!;
my #keys = qw(2487 2564);
my #vals = qw(st_responsible critical last_modified_by transstatus rt_res);
print $csv_out "xnum,st_responsible,critical,last_modified_by,transstatus,rt_res\n";
for my $key (#keys){
print $csv_out "$key,";
for my $vals (#vals){
$vals eq $vals[-1] ? print $csv_out "$hash{$key}{$vals}\n" : print $csv_out "$hash{$key}{$vals},";
}
}
This will print out comma-separated values to a csv file out.csv maintaining your original order (by iterating over arrays). If it's the last value it will print a newline.
--- OUTPUT ---
xnum,st_responsible,critical,last_modified_by,transstatus,rt_res
2487,mname2,,teamname2,,
2564,mname1,,teamname1,,pname1

Related

Parse report in blocks to CSV

I have lots of data dumps in a pretty huge amount of data structured as follow
Key1:.............. Value
Key2:.............. Other value
Key3:.............. Maybe another value yet
Key1:.............. Different value
Key3:.............. Invaluable
Key5:.............. Has no value at all
Which I would like to transform to something like:
Key1,Key2,Key3,Key5
Value,Other value,Maybe another value yet,
Different value,,Invaluable,Has no value at all
I mean:
Generate a collection of all the keys
Generate a header line with all the Keys
Map all the values to their correct "columns" (notice that in this example I have no "Key4", and Key3/Key5 interchanged)
Possibly in Perl, since it would be easier to use in various environments.
But I am not sure if this format is unusual, or if there is a tool that already does this.
This is fairly easy using hashes and the Text::CSV_XS module:
use strict;
use warnings;
use Text::CSV_XS;
my #rows;
my %headers;
{
local $/ = "";
while (<DATA>) {
chomp;
my %record;
for my $line (split(/\n/)) {
next unless $line =~ /^([^:]+):\.+\s(.+)/;
$record{$1} = $2;
$headers{$1} = $1;
}
push(#rows, \%record);
}
}
unshift(#rows, \%headers);
my $csv = Text::CSV_XS->new({binary => 1, auto_diag => 1, eol => $/});
$csv->column_names(sort(keys(%headers)));
for my $row_ref (#rows) {
$csv->print_hr(*STDOUT, $row_ref);
}
__DATA__
Key1:.............. Value
Key2:.............. Other value
Key3:.............. Maybe another value yet
Key1:.............. Different value
Key3:.............. Invaluable
Key5:.............. Has no value at all
Output:
Key1,Key2,Key3,Key5
Value,"Other value","Maybe another value yet",
"Different value",,Invaluable,"Has no value at all"
If your CSV format is 'complicated' - e.g. it contains commas, etc. - then use one of the Text::CSV modules. But if it isn't - and this is often the case - I tend to just work with split and join.
What's useful in your scenario, is that you can map key-values within a record quite easily using a regex. Then use a hash slice to output:
#!/usr/bin/env perl
use strict;
use warnings;
#set paragraph mode - records are blank line separated.
local $/ = "";
my #rows;
my %seen_header;
#read STDIN or files on command line, just like sed/grep
while ( <> ) {
#multi - line pattern, that matches all the key-value pairs,
#and then inserts them into a hash.
my %this_row = m/^(\w+):\.+ (.*)$/gm;
push ( #rows, \%this_row );
#add the keys we've seen to a hash, so we 'know' what we've seen.
$seen_header{$_}++ for keys %this_row;
}
#extract the keys, make them unique and ordered.
#could set this by hand if you prefer.
my #header = sort keys %seen_header;
#print the header row
print join ",", #header, "\n";
#iterate the rows
foreach my $row ( #rows ) {
#use a hash slice to select the values matching #header.
#the map is so any undefined values (missing keys) don't report errors, they
#just return blank fields.
print join ",", map { $_ // '' } #{$row}{#header},"\n";
}
This for you sample input, produces:
Key1,Key2,Key3,Key5,
Value,Other value,Maybe another value yet,,
Different value,,Invaluable,Has no value at all,
If you want to be really clever, then most of that initial building of the loop can be done with:
my #rows = map { { m/^(\w+):\.+ (.*)$/gm } } <>;
The problem then is - you would need to build up the 'headers' array still, and that means a bit more complicated:
$seen_header{$_}++ for map { keys %$_ } #rows;
It works, but I don't think it's as clear about what's happening.
However the core of your problem may be the file size - that's where you have a bit of a problem, because you need to read the file twice - first time to figure out which headings exist throughout the file, and then second time to iterate and print:
#!/usr/bin/env perl
use strict;
use warnings;
open ( my $input, '<', 'your_file.txt') or die $!;
local $/ = "";
my %seen_header;
while ( <$input> ) {
$seen_header{$_}++ for m/^(\w+):/gm;
}
my #header = sort keys %seen_header;
#return to the start of file:
seek ( $input, 0, 0 );
while ( <$input> ) {
my %this_row = m/^(\w+):\.+ (.*)$/gm;
print join ",", map { $_ // '' } #{$this_row}{#header},"\n";
}
This will be slightly slower, as it'll have to read the file twice. But it won't use nearly as much memory footprint, because it isn't holding the whole file in memory.
Unless you know all your keys in advance, and you can just define them, you'll have to read the file twice.
This seems to work with the data you've given
use strict;
use warnings 'all';
my %data;
while ( <> ) {
next unless /^(\w+):\W*(.*\S)/;
push #{ $data{$1} }, $2;
}
use Data::Dump;
dd \%data;
output
{
Key1 => ["Value", "Different value"],
Key2 => ["Other value"],
Key3 => ["Maybe another value yet", "Invaluable"],
Key5 => ["Has no value at all"],
}

Manipulate and access the contents of a hash of hashes

I am having trouble with writing a Perl script.
This is the task:
My code works fine but has two issues.
I want to add an element to the hash %grocery, which contains category, brand and price. When adding the item, first the system will ask for the category.
If the category does not exist then it will add a new category, brand and price from the user, but if the category already exists then it will take the brand name and price from the user and append it to the existing category.
When I try to do so it erases the preexisting items. I want the previous items appended with the newly added item.
This issue is with the max value. To find the maximum price in the given hash. I am getting garbage value for that.
What am I doing wrong?
Here is my full code:
use strict;
use warnings;
use List::Util qw(max);
use feature "switch";
my $b;
my $c;
my $p;
my $highest;
print "____________________________STORE THE ITEM_____________________\n";
my %grocery = (
"soap" => { "lux" => 13.00, "enriche" => 11.00 },
"detergent" => { "surf" => 18.00 },
"cleaner" => { "domex" => 75.00 }
);
foreach my $c ( keys %grocery ) {
print "\n";
print "$c\n";
foreach my $b ( keys %{ $grocery{$c} } ) {
print "$b:$grocery{$c}{$b}\n";
}
}
my $ch;
do {
print "________________MENU_________________\n";
print "1.ADD ITEM\n";
print "2.SEARCH\n";
print "3.DISPLAY\n";
print "4.FIND THE MAX PRICE\n";
print "5.EXIT\n";
print "enter your choice \n";
$ch = <STDIN>;
chomp( $ch );
given ( $ch ) {
when ( 1 ) {
print "Enter the category you want to add";
$c = <STDIN>;
chomp( $c );
if ( exists( $grocery{$c} ) ) {
print "Enter brand\n";
$b = <STDIN>;
chomp( $b );
print "Enter price\n";
$p = <STDIN>;
chomp( $p );
$grocery{$c} = { $b, $p };
print "\n";
}
else {
print "Enter brand\n";
$b = <STDIN>;
chomp( $b );
print "Enter price\n";
$p = <STDIN>;
chomp( $p );
$grocery{$c} = { $b, $p };
print "\n";
}
}
when ( 2 ) {
print "Enter the item that you want to search\n";
$c = <STDIN>;
chomp( $c );
if ( exists( $grocery{$c} ) ) {
print "category $c exists\n\n";
print "Enter brand\n";
$b = <STDIN>;
chomp( $b );
if ( exists( $grocery{$c}{$b} ) ) {
print "brand $b of category $c exists\n\n";
print "-----$c-----\n";
print "$b: $grocery{$c}{$b}\n";
}
else {
print "brand $b does not exists\n";
}
}
else {
print "category $c does not exists\n";
}
}
when ( 3 ) {
foreach $c ( keys %grocery ) {
print "$c:\n";
foreach $b ( keys %{ $grocery{$c} } ) {
print "$b:$grocery{$c}{$b}\n";
}
}
}
when ( 4 ) {
print "\n________________PRINT HIGHEST PRICED PRODUCT____________________\n";
$highest = max values %grocery;
print "$highest\n";
}
}
} while ( $ch != 5 );
When I try to do so it erases the preexisting items. I want the previous items appended with the newly added item.
In this line you are overwriting the value of $grocery{$c} with a new hash reference.
$grocery{$c}={$b,$p};
Instead, you need to edit the existing hash reference.
$grocery{$c}->{$b} = $p;
That will add a new key $b to the existing data structure inside of $grocery{$b} and assign it the value of $p.
Let's take a look at what that means. I've added this to the code after %grocery gets initialized.
use Data::Dumper;
print Dumper \%grocery;
We will get the following output. Hashes are not sorted, so the order might be different for you.
$VAR1 = {
'cleaner' => {
'domex' => '75'
},
'detergent' => {
'surf' => '18'
},
'soap' => {
'enriche' => '11',
'lux' => '13'
}
};
As you can see we have hashes inside of hashes. In Perl, references are used to construct a multi level data structure. You can see that from the curly braces {} in the output. The very first one after $VAR1 is because I passed a reference of $grocery to Dumper by adding the backslash \ in front.
So behind the value for $grocery{"cleaner"} is a hash reference { "domex" => 75 }. To reach into that hash reference, you need to use the dereferencing operator ->. You can then put a new key into that hash ref like I showed above.
# ##!!!!!!!!!!
$grocery{"cleaner"}->{"foobar"} = 30;
I've marked the relevant parts above with a comment. You can read up on this stuff in these documents: perlreftut, perllol, perldsc and perlref.
This issue is with the max value. To find the max of the values of the given hash. I am getting garbage value for that.
This problem is also based on the fact that you don't yet understand references.
$highest = max values %grocery;
Your code will only take the values directly inside %grocery. If you scroll up and look at the Dumper output again, you'll see that there are three hash references inside of %grocery. Now if you do not dereference them, you just get their scalar representation. A scalar in Perl is a single value, like a number or a string. But for references it is their type and address. What looks like garbage is in fact the memory address of the three hash references in %grocery which has the highest number.
Of course that's not what you want. You need to iterate both levels of your data structure, collect all values and then find the highest one.
my #all_prices;
foreach my $category (keys %grocery) {
push #all_prices, values %{ $grocery{$category} };
}
$highest = max #all_prices;
print "$highest\n";
I chose a very verbose approach to do that. It iterates over all categories in %grocery and then grabs all the values of the hash reference stored behind each of them. Those get added to an array, and in the end we can take the max of all of them from the array.
You have the exact same code for the when a category already exists and when it does not. The line
$grocery{$c} = { $b, $p };
replaces the entire hash for category $c. That's fine for new categories, but if the category is already there then it will throw away any existing information
You need to write
$grocery{$c}{$b} = $p;
And please add a lot more whitespace around operators, separating the elements of lists, and delineating related sequences of statements
With regard to finding the maximum price, your line
$highest = max values %grocery;
is trying to calculate the maximum of the hash references corresponding to the categories
Since there are two levels of hash here, you need
$highest = max map { values %$_ } values %grocery;
but that may not be the way you're expected to do it. If in doubt then you should use two nested for loops
use List::Util qw(max);
use Data::Dumper;
my $grocery =
{
"soap" => { "lux" => 13.00, "enriche" => 11.00 },
"detergent" => { "surf" => 18.00 },
"cleaner" => { "domex"=> 75.00 }
};
display("unadulterated list");
print Dumper $grocery;
display("new silky soap");
$grocery->{"soap"}->{"silky"} = 12.50;
print Dumper $grocery;
display("new mega cleaner");
$grocery->{"cleaner"}->{"megaclean"} = 99.99;
print Dumper $grocery;
display("new exfoliant soap");
$grocery->{"soap"}->{"exfoliant"} = 23.75;
print Dumper $grocery;
display("lux soap gets discounted");
$grocery->{"soap"}->{"lux"} = 9.00;
print Dumper $grocery;
display("domex cleaner is discontinued");
delete $grocery->{"cleaner"}->{"domex"};
print Dumper $grocery;
display("most costly soap product");
my $max = max values $grocery->{soap};
print $max, "\n\n";
sub display
{
printf("\n%s\n%s\n%s\n\n", '-' x 45, shift, '-' x 45 );
}

Perl list all keys in hash with identical values

If I have a colon-delimited file name FILE and I do:
cat FILE|perl -F: -lane 'my %hash = (); $hash{#F[0]} = #F[2]'
to assign the first and 3rd tokens as the key => value pairs for the hash..
1) Is that a sane way to assign key value pairs to a hash?
2) What is the simplest way to now find all keys with shared values and list them?
Assume FILE looks like:
Mike:34:Apple:Male
Don:23:Corn:Male
Jared:12:Apple:Male
Beth:56:Maize:Female
Sam:34:Apple:Male
David:34:Apple:Male
Desired Output: Keys with value "Apple": Mike,Jared,David,Sam
Your example won't work as you want because the -n option puts a while loop around your one-line program, so the hash you declare is created and destoyed for every record in the file. You could get around that by not declaring the hash, and so making it a persistent package variable which will retain all values stored in it.
You can then write push #{ $hash{$F[2]} }, $F[0] but notice that it should be $F[0] etc. and not #F[0], and I have used push to create a list of column 1 values for each column 3 value instead of just a list of one-to-one values relating each column 1 value with its column 3 value.
To clarify, your method produces a hash looking like this, which has to be searched to produce the display that you want.
(
Beth => "Maize",
David => "Apple",
Don => "Corn",
Jared => "Apple",
Mike => "Apple",
Sam => "Apple",
)
while mine creates this, which as you can see is pretty much already in the form you want.
(
Apple => ["Mike", "Jared", "Sam", "David"],
Corn => ["Don"],
Maize => ["Beth"],
)
But I think this problem is a bit too big to be solved with a one-line Perl program. The solution below expects the path to the input file as a command-line parameter, like this
> perl prog.pl colons.csv
but it will default to myfile.csv if no file is specified.
use strict;
use warnings;
our #ARGV = 'myfile.csv' unless #ARGV;
my %data;
while (<>) {
my #fields = split /:/;
push #{ $data{$fields[2]} }, $fields[0];
}
while (my ($k, $v) = each %data) {
next unless #$v > 1;
printf qq{Keys with value "%s": %s\n}, $k, join ', ', #$v;
}
output
Keys with value "Apple": Mike, Jared, Sam, David
use strict;
use warnings;
open my $in, '<', 'in.txt';
my %data;
while(<$in>){
chomp;
my #split = split/:/;
$data{$split[0]} = $split[2];
}
my $query = 'Apple';
print "Keys with value $query = ";
foreach my $name (keys %data){
print "$name " if $data{$name} eq $query;
}
print "\n";
Arrays are used to hold list of values, so use an array.
perl -F: -lane'
push #{ $h{$F[2]} }, $F[0];
END {
for my $fruit (keys %h) {
next if #{ $h{$fruit} } < 2;
print "$fruit: ", join(",", #{ $h{$fruit} });
}
}
' FILE
The END block is executed on exit. In it, we iterate over the keys of the hash. If the value of the current hash element is an array with only one element, it's skipped. Otherwise, we prints the key followed by contents of the array referenced by the hash element.
Here is another way:
perl -F: -lane'
push #{ $h{$F[2]} }, $F[0];
}{
print "$_: ", join(",", #{ $h{$_} }) for grep { #{$h{$_}} > 1 } keys %h;
' file
We read each line and create hash of arrays using third column as key and first column as list of values for matching key. In the END block we iterate over our hash using grep and filter keys whose array count greater than 1 and print the key followed by array elements.
It doesn't have to be a one liner,
Good. It's not going to be...
Is that a sane way to assign key value pairs to a hash?
You're simply assigning the key value pairs as:
$hash{"key"} = "value";
Which is about as simple as it gets. There might be a way of doing it via map. However, the main issue I see is what should happen if you have duplicate keys.
Let's say your file looks like this:
Mike:34:Apple:Male
Don:23:Corn:Male
Jared:12:Apple:Male
Beth:56:Maize:Female
Sam:34:Apple:Male
David:34:Apple:Male # Note this entry is here twice!
David:35:Wheat:Male # Note this entry is here twice!
Let's do a simple assignment loop:
my %hash;
while my $line ( <$fh> ) {
chomp $line;
my ($name, $age, $category, $sex) = split /:/, $line;
$hash{$name} = $category;
}
When you get to $hash{David}, it will first be set to Apple, but then you change the value to Wheat. There are four ways you can handle this:
Use whatever the last value is. No change in the loop.
Use the first value and ignore subsequent values. Simple enough to do.
If that happens, it's an error. Abort the program and report the error.
Keep all values.
This last one is the most interesting because it involves a reference to an array as the values for your hash:
my %hash;
while my $line ( <$fh> ) {
chomp $line;
my ($name, $age, $category, $sex) = split /:/, $line;
$hash{$name} = [] if not exists $hash{$name}; # I'm making this an array reference
push #{ $hash{$name} }, $category;
}
Now, each value in my hash is a reference to an array:
my #values = #{ $hash{David} ); # The values of David...
print "David is in categories " . join ( ", ", #values ) . "\n";
This will print out David is in categories Wheat, Apple
What is the simplest way to now find all keys with shared values and list them?
The easiest way is to create a second hash that's keyed by your value. In this hash, you will need to use an array reference. Let's assume no duplicate names for now:
my %hash;
my %indexed_hash;
while my $line ( <$fh> ) {
chomp $line;
my ($name, $age, $category, $sex) = split /:/, $line;
$hash{$name} = $category;
my $indexed_hash{$category} = [] if not exist $indexed_hash{$category};
push #{ $indexed_hash{$category} }, $name;
}
Now, if I want to find all the duplicates of Apple:
my #names = #{ $indexed_hash{Apple} };
print "The following are in 'Apple': " . join ( ", " #names ) . "\n";
Since we're getting into references, we could take things a step further and store all of your values of your file in your hash. Again, for simplicity, I am assuming that you will have one and only one entry per name:
my %hash;
while my $line ( <$fh> ) {
chomp $line;
my ($name, $age, $category, $sex) = split /:/, $line;
$hash{$name}->{AGE} = $age;
$hash{$name}->{CATEGORY} = $category;
$hash{$name}->{SEX} = $sex;
}
for my $name ( sort keys %hash ) {
print "$name Information:\n";
print " Age: " . $hash{$name}->{AGE} . "\n";
printf "Category: %s\n", $hash{$name}->{CATEGORY};
print " Sex: #{[$hash{$name}->{SEX}]}\n\n";
}
That last two statements are easier ways of interpolating complex data structures into a string. The printf is fairly clear. The second #{[...]} is a neat little trick.
What have you tried?
If you reverse the hash into a list of value => key pairs then use List::Util's pairs() against the list, you can transform the hash into a hash of values => key arrayrefs. i.e. ( foo => [ 'bar', 'baz' ] ), grep {#{$hash{$_}} > 1} keys %hash, and print the results.

Build hash of hash in perl

I'm new to using perl and I'm trying to build a hash of a hash from a tsv. My current process is to read in a file and construct a hash and then insert it into another hash.
my %hoh = ();
while (my $line = <$tsv>)
{
chomp $line;
my %hash;
my #data = split "\t", $line;
my $id;
my $iter = each_array(#columns, #data);
while(my($k, $v) = $iter->())
{
$hash{$k} = $v;
if($k eq 'Id')
{
$id = $v;
}
}
$hoh{$id} = %hash;
}
print "dump: ", Dumper(%hoh);
This outputs:
dump
$VAR1 = '1234567890';
$VAR2 = '17/32';
$VAR3 = '1234567891';
$VAR4 = '17/32';
.....
Instead of what I would expect:
dump
{
'1234567890' => {
'k1' => 'v1',
'k2' => 'v2',
'k3' => 'v3',
'k4' => 'v4',
'id' => '1234567890'
},
'1234567891' => {
'k1' => 'v1',
'k2' => 'v2',
'k3' => 'v3',
'k4' => 'v4',
'id' => '1234567891'
},
........
};
My limited understanding is that when I do $hoh{$id} = %hash; its inserting in a reference to %hash? What am I doing wrong? Also is there a more succint way to use my columns and data array's as key,value pairs into my %hash object?
-Thanks in advance,
Niru
To get a reference, you have to use \:
$hoh{$id} = \%hash;
%hash is the hash, not the reference to it. In scalar context, it returns the string X/Y wre X is the number of used buckets and Y the number of all the buckets in the hash (i.e. nothing useful).
To get a reference to a hash variable, you need to use \%hash (as choroba said).
A more succinct way to assign values to columns is to assign to a hash slice, like this:
my %hoh = ();
while (my $line = <$tsv>)
{
chomp $line;
my %hash;
#hash{#columns} = split "\t", $line;
$hoh{$hash{Id}} = \%hash;
}
print "dump: ", Dumper(\%hoh);
A hash slice (#hash{#columns}) means essentially the same thing as ($hash{$columns[0]}, $hash{$columns[1]}, $hash{$columns[2]}, ...) up to however many columns you have. By assigning to it, I'm assigning the first value from split to $hash{$columns[0]}, the second value to $hash{$columns[1]}, and so on. It does exactly the same thing as your while ... $iter loop, just without the explicit loop (and it doesn't extract the $id).
There's no need to compare each $k to 'Id' inside a loop; just store it in the hash as a normal field and extract it afterwards with $hash{Id}. (Aside: Is your column header Id or id? You use Id in your loop, but id in your expected output.)
If you don't want to keep the Id field in the individual entries, you could use delete (which removes the key from the hash and returns the value):
$hoh{delete $hash{Id}} = \%hash;
Take a look at the documentation included in Perl. The command perldoc is very helpful. You can also look at the Perldoc webpage too.
One of the tutorials is a tutorial on Perl references. It all help clarify a lot of your questions and explain about referencing and dereferencing.
I also recommend that you look at CPAN. This is an archive of various Perl modules that can do many various tasks. Look at Text::CSV. This module will do exactly what you want, and even though it says "CSV", it works with tab separated files too.
You missed putting a slash in front of your hash you're trying to make a reference. You have:
$hoh{$id} = %hash;
Probably want:
$hoh{$id} = \%hash;
also, when you do a Data::Dumper of a hash, you should do it on a reference to a hash. Internally, hashes and arrays have similar structures when a Data::Dumper dump is done.
You have:
print "dump: ", Dumper(%hoh);
You should have:
print "dump: ", Dumper( \%hoh );
My attempt at the program:
#! /usr/bin/env perl
#
use warnings;
use strict;
use autodie;
use feature qw(say);
use Data::Dumper;
use constant {
FILE => "test.txt",
};
open my $fh, "<", FILE;
#
# First line with headers
#
my $line = <$fh>;
chomp $line;
my #headers = split /\t/, $line;
my %hash_of_hashes;
#
# Rest of file
#
while ( my $line = <$fh> ) {
chomp $line;
my %line_hash;
my #values = split /\t/, $line;
for my $index ( ( 0..$#values ) ) {
$line_hash{ $headers[$index] } = $values[ $index ];
}
$hash_of_hashes{ $line_hash{id} } = \%line_hash;
}
say Dumper \%hash_of_hashes;
You should only store a reference to a variable if you do so in the last line before the variable goes go of scope. In your script, you declare %hash inside the while loop, so placing this statement as the last in the loop is safe:
$hoh{$id} = \%hash;
If it's not the last statement (or you're not sure it's safe), create an anonymous structure to hold the contents of the variable:
$hoh{$id} = { %hash };
This makes a copy of %hash, which is slower, but any subsequent changes to it will not effect what you stored.

How can I convert these strings to a hash in Perl?

I wish to convert a single string with multiple delimiters into a key=>value hash structure. Is there a simple way to accomplish this? My current implementation is:
sub readConfigFile() {
my %CONFIG;
my $index = 0;
open(CON_FILE, "config");
my #lines = <CON_FILE>;
close(CON_FILE);
my #array = split(/>/, $lines[0]);
my $total = #array;
while($index < $total) {
my #arr = split(/=/, $array[$index]);
chomp($arr[1]);
$CONFIG{$arr[0]} = $arr[1];
$index = $index + 1;
}
while ( ($k,$v) = each %CONFIG ) {
print "$k => $v\n";
}
return;
}
where 'config' contains:
pub=3>rec=0>size=3>adv=1234 123 4.5 6.00
pub=1>rec=1>size=2>adv=111 22 3456 .76
The last digits need to be also removed, and kept in a separate key=>value pair whose name can be 'ip'. (I have not been able to accomplish this without making the code too lengthy and complicated).
What is your configuration data structure supposed to look like? So far the solutions only record the last line because they are stomping on the same hash keys every time they add a record.
Here's something that might get you closer, but you still need to figure out what the data structure should be.
I pass in the file handle as an argument so my subroutine isn't tied to a particular way of getting the data. It can be from a file, a string, a socket, or even the stuff below DATA in this case.
Instead of fixing things up after I parse the string, I fix the string to have the "ip" element before I parse it. Once I do that, the "ip" element isn't a special case and it's just a matter of a double split. This is a very important technique to save a lot of work and code.
I create a hash reference inside the subroutine and return that hash reference when I'm done. I don't need a global variable. :)
use warnings;
use strict;
use Data::Dumper;
readConfigFile( \*DATA );
sub readConfigFile
{
my( $fh ) = shift;
my $hash = {};
while( <$fh> )
{
chomp;
s/\s+(\d*\.\d+)$/>ip=$1/;
$hash->{ $. } = { map { split /=/ } split />/ };
}
return $hash;
}
my $hash = readConfigFile( \*DATA );
print Dumper( $hash );
__DATA__
pub=3>rec=0>size=3>adv=1234 123 4.5 6.00
pub=1>rec=1>size=2>adv=111 22 3456 .76
This gives you a data structure where each line is a separate record. I choose the line number of the record ($.) as the top-level key, but you can use anything that you like.
$VAR1 = {
'1' => {
'ip' => '6.00',
'rec' => '0',
'adv' => '1234 123 4.5',
'pub' => '3',
'size' => '3'
},
'2' => {
'ip' => '.76',
'rec' => '1',
'adv' => '111 22 3456',
'pub' => '1',
'size' => '2'
}
};
If that's not the structure you want, show us what you'd like to end up with and we can adjust our answers.
I am assuming that you want to read and parse more than 1 line. So, I chose to store the values in an AoH.
#!/usr/bin/perl
use strict;
use warnings;
my #config;
while (<DATA>) {
chomp;
push #config, { split /[=>]/ };
}
for my $href (#config) {
while (my ($k, $v) = each %$href) {
print "$k => $v\n";
}
}
__DATA__
pub=3>rec=0>size=3>adv=1234 123 4.5 6.00
pub=1>rec=1>size=2>adv=111 22 3456 .76
This results in the printout below. (The while loop above reads from DATA.)
rec => 0
adv => 1234 123 4.5 6.00
pub => 3
size => 3
rec => 1
adv => 111 22 3456 .76
pub => 1
size => 2
Chris
The below assumes the delimiter is guaranteed to be a >, and there is no chance of that appearing in the data.
I simply split each line based on '>'. The last value will contain a key=value pair, then a space, then the IP, so split this on / / exactly once (limit 2) and you get the k=v and the IP. Save the IP to the hash and keep the k=v pair in the array, then go through the array and split k=v on '='.
Fill in the hashref and push it to your higher-scoped array. This will then contain your hashrefs when finished.
(Having loaded the config into an array)
my #hashes;
for my $line (#config) {
my $hash; # config line will end up here
my #pairs = split />/, $line;
# Do the ip first. Split the last element of #pairs and put the second half into the
# hash, overwriting the element with the first half at the same time.
# This means we don't have to do anything special with the for loop below.
($pairs[-1], $hash->{ip}) = (split / /, $pairs[-1], 2);
for (#pairs) {
my ($k, $v) = split /=/;
$hash->{$k} = $v;
}
push #hashes, $hash;
}
The config file format is sub-optimal, shall we say. That is, there are easier formats to parse and understand. [Added: but the format is already defined by another program. Perl is flexible enough to deal with that.]
Your code slurps the file when there is no real need.
Your code only pays attention to the last line of data in the file (as Chris Charley noted while I was typing this up).
You also have not allowed for comment lines or blank lines - both are a good idea in any config file and they are easy to support. [Added: again, with the pre-defined format, this is barely relevant, but when you design your own files, do remember it.]
Here's an adaptation of your function into somewhat more idiomatic Perl.
#!/bin/perl -w
use strict;
use constant debug => 0;
sub readConfigFile()
{
my %CONFIG;
open(CON_FILE, "config") or die "failed to open file ($!)\n";
while (my $line = <CON_FILE>)
{
chomp $line;
$line =~ s/#.*//; # Remove comments
next if $line =~ /^\s*$/; # Ignore blank lines
foreach my $field (split(/>/, $line))
{
my #arr = split(/=/, $field);
$CONFIG{$arr[0]} = $arr[1];
print ":: $arr[0] => $arr[1]\n" if debug;
}
}
close(CON_FILE);
while (my($k,$v) = each %CONFIG)
{
print "$k => $v\n";
}
return %CONFIG;
}
readConfigFile; # Ignores returned hash
Now, you need to explain more clearly what the structure of the last field is, and why you have an 'ip' field without the key=value notation. Consistency makes life easier for everybody. You also need to think about how multiple lines are supposed to be handled. And I'd explore using a more orthodox notation, such as:
pub=3;rec=0;size=3;adv=(1234,123,4.5);ip=6.00
Colon or semi-colon as delimiters are fairly conventional; parentheses around comma separated items in a list are not an outrageous convention. Consistency is paramount. Emerson said "A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines", but consistency in Computer Science is a great benefit to everyone.
Here's one way.
foreach ( #lines ) {
chomp;
my %CONFIG;
# Extract the last digit first and replace it with an end of
# pair delimiter.
s/\s*([\d\.]+)\s*$/>/;
$CONFIG{ip} = $1;
while ( /([^=]*)=([^>]*)>/g ) {
$CONFIG{$1} = $2;
}
print Dumper ( \%CONFIG );
}