Split mutli-element string into hash - perl

I know that one can easily split a string into a hash using map like this question How to do I split a string into hash keys with undef values?, or this Perl Monks thread. So, something like this very easily works:
my %table = map { chomp; split(/\t/) } <DATA>;
dd \%table;
__DATA__
#1245 banana
#3499 cherry
#5290 notebook
#2112 compact_disc
Of course, this would result in:
{
"#1245" => "banana",
"#2112" => "compact_disc",
"#3499" => "cherry",
"#5290" => "notebook",
}
If one had a more complicated table of data, though, and wanted to make a hash of arrays using the second column as the key, is this possible with map, or does one have to use the "longer" form:
my %table;
while(<DATA>) {
chomp(my #elems = split(/\t/));
$table{$elems[1]} = \#elems;
}
dd \%table;
__DATA__
shelf1 #1245 banana Dole
shelf1 #3499 cherry Acme
shelf2 #5290 notebook Staples
shelf3 #2112 compact_disc Mercury_Records
to make:
{
"#1245" => ["shelf1", "#1245", "banana", "Dole"],
"#2112" => ["shelf3", "#2112", "compact_disc", "Mercury_Records"],
"#3499" => ["shelf1", "#3499", "cherry", "Acme"],
"#5290" => ["shelf2", "#5290", "notebook", "Staples"],
}
I tried these two approaches, but neither seem to work, and I'm guessing it's not possible. But, just out of curiosity (and education) was wondering if one can do it a similar way.
my %table = map{ $_->[1] => #$_ } split(/\t/, <DATA>);
my %table = map{ split(/\t/); $_->[1], #$_ } <DATA>;

You can use map, but you need to move the split inside:
my %table = map { chomp; my #s = split /\t/; $s[1], \#s } <DATA>;

Related

Perl: Add hash as sub hash to simple hash

I looked at the other two questions that seem to be about this, but they are a little obtuse and I can't relate them to what I want to do, which I think is simpler. I also think this will be a much clearer statement of a very common problem/task so I'm posting this for the benefit of others like me.
The Problem:
I have 3 files, each file a list of key=value pairs:
settings1.ini
key1=val1
key2=val2
key3=val3
settings2.ini
key1=val4
key2=val5
key3=val6
settings3.ini
key1=val7
key2=val8
key3=val9
No surprise, I want to read those key=value pairs into a hash to operate on them, so...
I have a hash of the filenames:
my %files = { file1 => 'settings1.ini'
, file2 => 'settings2.ini'
, file3 => 'settings3.ini'
};
I can iterate through the filenames like so:
foreach my $fkey (keys %files) {
say $files{$fkey};
}
Ok.
Now I want to add the list of key=value pairs from each file to the hash as a sub-hash under each respective 'top-level' filename key, such that I can iterate through them like so:
foreach my $fkey (keys %files) {
say "File: $files{$fkey}";
foreach my $vkey (keys $files{$fkey}) {
say " $vkey: $files{$fkey}{$vkey}";
}
}
In other words, I want to add a second level to the hash such that it goes from just being (in psuedo terms) a single layer list of values:
file1 => settings1.ini
file2 => settings2.ini
file3 => settings3.ini
to being a multi-layered list of values:
file1 => key1 => 'val1'
file1 => key2 => 'val2'
file1 => key3 => 'val3'
file2 => key1 => 'val4'
file2 => key2 => 'val5'
file2 => key3 => 'val6'
file3 => key1 => 'val7'
file3 => key2 => 'val8'
file3 => key3 => 'val9'
Where:
my $fkey = 'file2';
my $vkey = 'key3';
say $files{$fkey}{$vkey};
would print the value
'val6'
As a side note, I am trying to use File::Slurp to read in the key=value pairs. Doing this on a single level hash is fine:
my %new_hash = read_file($files{$fkey}) =~ m/^(\w+)=([^\r\n\*,]*)$/img;
but - to rephrase this whole problem - what I really want to do is 'graft' the new hash of key=value pairs onto the existing hash of filenames 'under' the top $file key as a 'child/branch' sub-hash.
Questions:
How do I do this, how do I build a multi-level hash one layer at a time like this?
Can I do this without having to pre-define the hash as multi-layered up front?
I use strict; and so I have seen the
Can't use string ("string") as a HASH ref while "strict refs" in use at script.pl line <lineno>.
which I don't fully understand...
Edit:
Thank you Timur Shtatland, Polar Bear and Dave Cross for your great answers. In mentally parsing your suggestions it occurred to me that I had slightly mislead you by being a little inconsistent in my original question. I apologize. I also think I see now why I saw the 'strict refs' error. I have made some changes.
Note that my first mention of the initial hash of filename is correct. The subsequent foreach examples looping through %files, however, were incorrect because I went from using file1 as the first file key to using settings1.ini as the first file key. I think this is why Perl threw the strict refs error - because I tried to change the key from the initial string to a hash_ref pointing to the sub-hash (or vice versa).
Have I understood that correctly?
There are several CPAN modules purposed for ini files. You should study what is available and choose what your heart desire.
Otherwise you can write your own code something in the spirit of following snippet
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my #files = qw(settings1.ini settings2.ini settings3.ini);
my %hash;
for my $file (#files) {
$hash{$file} = read_settings($file);
}
say Dumper(\%hash);
sub read_settings {
my $fname = shift;
my %hash;
open my $fh, '<', $fname
or die "Couldn't open $fname";
while( <$fh> ) {
chomp;
my($k,$v) = split '=';
$hash{$k} = $v
}
close $fh;
return \%hash;
}
Output
$VAR1 = {
'settings1.ini' => {
'key2' => 'val2',
'key1' => 'val1',
'key3' => 'val3'
},
'settings2.ini' => {
'key2' => 'val5',
'key1' => 'val4',
'key3' => 'val6'
},
'settings3.ini' => {
'key1' => 'val7',
'key2' => 'val8',
'key3' => 'val9'
}
};
To build the hash one layer at a time, use anonymous hashes. Each value of %files here is a reference to a hash, for example, for $files{'settings1.ini'}:
# read the data into %new_hash, then:
$files{'settings1.ini'} = { %new_hash }
You do not need to predefine the hash as multi-layered (as hash of hashes) upfront.
Also, avoid reinventing the wheel. Use Perl modules for common tasks, in this case consider something like Config::IniFiles for parsing *.ini files
SEE ALSO:
Anonymous hashes: perlreftut
Hashes of hashes: perldsc
Perl makes stuff like this ridiculously easy.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my %files;
# <> reads from the files given on the command line
# one line at a time.
while (<>) {
chomp;
my ($key, $val) = split /=/;
# $ARGV contains the name of the file that
# is currently being read.
$files{$ARGV}{$key} = $val;
}
say Dumper \%files;
Running this as:
$ perl readconf settings1.ini settings2.ini settings3.ini
Gives the following output:
$VAR1 = {
'settings3.ini' => {
'key2' => 'val8',
'key1' => 'val7',
'key3' => 'val9'
},
'settings2.ini' => {
'key3' => 'val6',
'key1' => 'val4',
'key2' => 'val5'
},
'settings1.ini' => {
'key3' => 'val3',
'key1' => 'val1',
'key2' => 'val2'
}
};

Accessing and modifying a nested hash based on a dot separated string

I have a string as input, say apple.mango.orange = 100
I also have a hash reference:
$inst = {
'banana' => 2,
'guava' => 3,
'apple' => {
'mango' => {
'orange' => 80
}
}
};
I want to modify the value of orange using the input string. Can someone please help me how I could do this?
I tried splitting the string into (key, value) pair. I then did the following on the key string:
my $key2 = "\$inst->{".$key."}";
$key2 =~ s/\./}->{/g;
$$key2 = $value;
This does not work as intended. Can someone help me out here? I have read the Perl FAQ about not using a variable value as variable but I am unable to think of an alternative.
You are building string that consists of (buggy) Perl code, but you never ask Perl to execute it. ...but that's not the right approach.
sub dive_val :lvalue {
my $p = \shift;
$p = \($$p->{$_}) for #_;
$$p
}
my #key = split /\./, "apple.mango.orange";
dive_val($inst, #key) = $value;
or
use Data::Diver qw( DiveVal );
my #key = split /\./, "apple.mango.orange";
DiveVal($inst, map \$_, #key) = $value;
Not only is a symbolic reference a very bad idea here, it doesn't even solve your problem. You're building an expression in $key2, and just jamming another dollar sign in front of its name won't make perl execute that code. For that you would need eval, which is another bad idea
You can install and use the Data::Diver module, which does exactly this sort of thing, or you can simply loop over the list of hash keys, picking up a new hash reference each time and assigning the value to the element with the last key
The biggest issue is actually parsing the incoming string into a list of keys and a value. This code implements a subroutine apply which applies the implied operation in the string to a nested hash. Unless you are confident of your data, it needs some error checking addingto make sure each of the keys in the list exists. The Data:;Dumper output is just to demonstrate the validity of the result
use strict;
use warnings 'all';
use Data::Dumper;
my $inst = { 'banana' => 2, 'guava' => 3, 'apple' => { 'mango' => { 'orange' => 80 } } };
my $s = 'apple.mango.orange = 100';
apply($s, $inst);
print Dumper $inst;
sub apply {
my ($operation, $data) = #_;
my ($keys, $val) = $operation =~ /([\w.]+)\s*=\s*(\d+)/;
my #keys = split /\./, $keys;
my $last = pop #keys;
my $hash = $data;
$hash = $hash->{$_} for #keys;
$hash->{$last} = $val;
}
output
$VAR1 = {
'banana' => 2,
'apple' => {
'mango' => {
'orange' => '100'
}
},
'guava' => 3
};

String Parsing for nested parenthesis in perl

The issue is when I try to compare the input to the output file, i am unable to handle the nesting of the parenthesis, and the complexity needs to be very low. is there a parsing module for this? compatible to 5.8.4. I found modules but they needed at least 5.10.:(
Input
(K1=V1,K2=V2,K3=V3(K2=V2.K5=V5)K6=V6(K7=V7,K8=V8(K9=V9,K10=V10)K11=V11)K12=V12,K13=V13)
OUTPUT FILE
(K0=V0,K1=V1,K2=V2,K3=V3(K1=V1,K2=V2,K4=V4,K5=V5,K14=V14),K15=V15,K6=V6(K18=V18,K7=V7,K19=V19,K8=V8(K20=V20,K9=V9,K16=V16,K10=V10,K21=V21)K11=V11)K12=V12,K13=V13,K22=V22)
I need to pick up each key value pair from input and one by one verify from the output file that the value is the same. if not
I need to store the key with the existing value.( The issue is with the nesting )
INPUT
K3=V3(K2=V2,K5=V5)
OUTPUT
K3=V3(K1=V1,K2=V2,K4=V4,K5=V5,K14=V14)
The issue is that "K2=V2" inside the V3 value is to be checked inside the V3 value in the output file. So I cannot just use a regular expression to do that as K2=V2 may appear outside the V3 parenthesis too.
I was trying to create a hash of a hash of a hash but failed. could someone suggest a way I could achieve this?
The following code builds the hash of hashes. Note that values (V3) are lost if they contain an inner hash.
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
sub to_hash {
my $string = shift;
$string =~ s/^\( | \)$//gx; # Remove the outer parentheses.
my #stack = {};
my #keys;
while (length $string) {
$string =~ s/^([^,=()]+) = ([^(),]*)//x or die $string;
my ($key, $value) = ($1, $2);
$stack[-1]{$key} = $value;
next if $string =~ s/^,//;
if ($string =~ s/^\(//) {
push #stack, {};
push #keys, $key;
} elsif ($string =~ s/^\),?//) {
my $last = pop #stack;
$stack[-1]{ pop #keys } = $last;
}
}
return $stack[0]
}
my $input = '(K1=V1,K2=V2,K3=V3(K2=V2,K5=V5)K6=V6(K7=V7,K8=V8(K9=V9,K10=V10)K11=V11)K12=V12,K13=V13)';
print Dumper to_hash($input);
Output
$VAR1 = {
'K2' => 'V2',
'K13' => 'V13',
'K6' => {
'K7' => 'V7',
'K8' => {
'K9' => 'V9',
'K10' => 'V10'
},
'K11' => 'V11'
},
'K3' => {
'K2' => 'V2',
'K5' => 'V5'
},
'K12' => 'V12',
'K1' => 'V1'
};
Nested parens either suggests an application of Text::Balanced and its extract_bracketed function, or building yourself a little parser subclass on Parser::MGC. Using the latter to build a little "convert string into data structure" parser is usually pretty straightforward for simple examples like this.

Populating a perl array of hashes with copies of a single hash

I have searched and searched and I can't get any of the code I've found to work. I'm sorry if this is repeating old ground, but I've now spent 2 days trying to get these 10 lines to work and I am at my wits' end with no hair left :-(
I am running Perl 5.8.8.
I want to populate an array of hashes in Perl such that it contains multiple copies of a single hash variable I am updating. My code is here:
use strict;
use warnings;
my #array;
my %tempHash = (state => "apple", symbol => "54", memberId => "12345");
push(#array, \%tempHash);
%tempHash = (state => "tiger", symbol => "22", memberId => "12345");
push(#array, \%tempHash);
%tempHash = (state => "table", symbol => "37", memberId => "12345");
push(#array, \%tempHash);
printf("%p %p %p\n", $array[0], $array[1], $array[2]);
foreach my $entry (#array){
printf("state: %s\n", $entry->{state});
printf("memberId: %s\n", $entry->{memberId});
printf("symbol: %s\n\n", $entry->{symbol});
}
This produces the following output:
1868954 18688d0 18688c4
state: table
memberId: 12345
symbol: 37
state: table
memberId: 12345
symbol: 37
state: table
memberId: 12345
symbol: 37
So it looks to me like the scalar values in the array are different. Yet the values in the hashes these scalars point to are all the same.
Thanks in advance for your help.
1) The code you posted doesn't work under use strict;, did you mean %tempHash and %hash are really the same variable?
2) If you use %s instead of %p, you'll get 3 identical HASH(0x1234abcd) strings, which means the contents of the array are indeed references to the same hash.
3) I would suggest creating a new anonymous hash each time:
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
my #array;
my %tempHash = (state => "apple", symbol => "54",memberId => "12345");
push(#array, { %tempHash });
%tempHash = (state => "tiger", symbol => "22", memberId => "12345");
push(#array, { %tempHash });
%tempHash = (state => "table", symbol => "37", memberId => "12345");
push(#array, { %tempHash });
print Dumper( \#array );
It sounds like you are fetching data a line at a time from a CSV file using Text::CSV.
Suppose your code is like this
my %tempHash;
my #array;
while (my $line = $csv->getline($fh)) {
# Add values to %tempHash;
push #array, \%tempHash;
}
then you could solve your problem very simply by declaring %tempHash insode the while loop
my #array;
while (my $line = $csv->getline($fh)) {
my %tempHash;
# Add values to %tempHash;
push #array, \%tempHash;
}
because Perl creates a new lexical hash each time the block is entered
Update
If the data isn't necessarily complete after each input record, then write
my #array;
my $data = {};
while ( my $line = $csv->getline($fh) ) {
# use information from $line to supplement $data
if ($data is complete) {
push #array, $data;
$data = {};
}
}
If you'd added use strict and use warnings to tor script, it would have told you what's wrong:
1st, your filling the hash temphash and store a reference to it. Next you create a new has, hash which you fill BUT NEVER USE! Instead, you add new references to temphash...
My bet is the push function does just not care about references. See perldoc:
Starting with Perl 5.14, push can take a scalar EXPR, which must hold
a reference to an unblessed array. The argument will be dereferenced
automatically. This aspect of push is considered highly experimental.
The exact behaviour may change in a future version of Perl.
EDIT
You could try to not use push:
my #array;
my %tempHash = (state => "apple", symbol => "54", memberId => "12345");
$#array=4;
$array[0]=\%tempHash;
$array[1]=\%tempHash;
$array[2]=\%tempHash;
printf("%p %p %p\n", $array[0], $array[1], $array[2]);
foreach my $entry (#array){
printf("state: %s\n", $entry->{state});
printf("memberId: %s\n", $entry->{memberId});
printf("symbol: %s\n\n", $entry->{symbol});
}
with same result as my perl interpreter just told me :-( (perl 5.14.2)

How do I find and count duplicate values in a perl hash

I need to find the duplicate values in a perl hash and then output the key/value pair and associated dup count when that count is > 1.
(I could leave a code sample of what I've attempted but that would just result in mass confusion and possibly uncontrolled laughter, and I'm really hoping to make it through life with some semblance of self esteem.)
Hash key/value would look like the following:
%hash = qw('FHDJ-124H' => 'hostname1', 'HJDHUR-87878' => 'hostname2', 'HGHDJH-874673' => 'hostname1');
My desired output would be:
2 duplicates found for hostname1
FHDJ-124H
HGHDJH-874673
Using perl 5.6 on Solaris 10. Tightly controlled production environment where upgrading or loading perl mods is not allowed. (A change request for moving to 5.8 is about 6 months out).
Many thanks!
You need to iterate through the hash keys in your first hash (key/value) and accumulate the count of each item you find in another hash (value/count).
If you want to display the keys together with duplicated values, your second hash cannot be as simple as that, since for each duplicated value you will have a collection of keys (all of them having the same value). In this case, simply accumulate the key in an array, then count its elements. I.e., your second hash would be something like (value/[key1,key2,key3...])
my %hash = ( key1 => "one", key2 => "two", key3 => "one", key4 => "two", key5 => "one" );
my %counts = ();
foreach my $key (sort keys %hash) {
my $value = $hash{$key};
if (not exists $counts{$value}) {
$counts{$value} = [];
}
push $counts{$value}, $key;
};
Then iterate over $counts to output what you need when the count of elements in $counts{$value} > 1
This is what you are looking for
#!/usr/bin/perl
use strict;
use warnings;
my %hash = ('FHDJ-124H' => 'hostname1', 'HJDHUR-87878' => 'hostname2', 'HGHDJH-874673' => 'hostname1');
my %reverse;
while (my ($key, $value) = each %hash) {
push #{$reverse{$value}}, $key;
}
while (my ($key, $value) = each %reverse) {
next unless #$value > 1;
print scalar(#$value), " duplicates found \n #$value have the same key $key\n";
}
What about:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dump qw(dump);
my %h = (a=>'v1', b=>'v2', c=>'v1', d=>'v3', e=>'v3');
my %r;
while(my($k,$v)=each%h){
push #{$r{$v}}, {$k=>$v};
}
dump %r;
output:
(
"v1",
[{ c => "v1" }, { a => "v1" }],
"v2",
[{ b => "v2" }],
"v3",
[{ e => "v3" }, { d => "v3" }],
)
Well, off of the top of my head, you could do something like this:
my #values=sort(values(%hash));
my #doubles=();
my %counts=();
foreach my $i (0..$#values)
{
foreach my $j (($i+1)..$#values)
{
if($values[$i] eq $values[$j])
{
push #doubles,$values[$i];
$counts{$values[$i]}++;
}
}
}
foreach(#doubles)
{
print "$hash{$_}, $_, $counts{$_}\n";
}
This is a bit of a naive solution (that I haven't tested, yet), and I'm sure there's a faster and slicker way, but this should work.