Perl array of hash structures

Perl array of hash structures - perl

This is a design setup question. I know in Perl there are not array of arrays. I am looking at reading in code that pulls in data from large text files at phases of something in flight. Each of these phases track different variables (and different numbers of them) . I have to store them because in the second part of the script i am rewriting them into another file I am updating as I read in.
I thought first I should have an array of hash's, however the variables are not the same at each phase. Then I thought maybe and array with the name of several arrays (array of references I guess) .
Data example would be similar to
phase 100.00 mass 0.9900720175260495E+005
phase 240.00 gcrad 61442116.0 long 0.963710076E+003 gdalt 0.575477727E+002 vell 0.9862937759999998E+002
Data is made up but you should get the idea and there would be many phases and the variable would likely range from 1 to 25 variables in each phase

You can use Arrays of Arrays in Perl. You can find documentation on Perl data structures including Arrays of Arrays here: http://perldoc.perl.org/perldsc.html. That said, looking at the sample you've provided it looks like what you need is an Array of Hashes. Perhaps something like this:
my #data = (
{ phase => 100.00,
mass => 0.9900720175260495e005 },
{ phase => 240.00
gcrad => 61442116.0
long => 0.963710076e003
gdalt => 0.575477727e002
vell => 0.9862937759999998e002 }
);
to access the data you would use:
$data[0]->{phase} # => 100.00
You could also use a Hash of Hashes like this:
my %data = (
name1 => {
phase => 100.00,
mass => 0.9900720175260495e005
},
name2 => {
phase => 240.00
gcrad => 61442116.0
long => 0.963710076e003
gdalt => 0.575477727e002
vell => 0.9862937759999998e002
}
);
to access the data you would use:
$data{name1}->{phase} # => 100.00
A great resource for learning how to implement advanced data structures and algorithms in Perl is the book, Mastering Algorithms in Perl

I use the following mnemonic when defining arrays, array references and hash references:
Use parentheses for lists -- lists can be assigned to either arrays or hashes:
my %person = (
given_name => 'Zaphod',
surname => 'Beeblebrox'
);
or
my #rainbow = (
'red',
'orange',
'yellow',
'green',
'blue',
'indigo',
'violet'
);
Because the lists are assigned to list types -- array and hash, there is no semantic ambiguity. When you deal with array references or hash references, however, the delimiter must distinguish between the types of reference, because the $ sigil for scalar variables can't be used to distinguish between the two types of reference. Therefore, [] is used to denote array references, just as [] is used to dereference arrays, and {} is used to denote hash references, just as {} is used to dereference hashes.
So an array of arrayrefs looks like this:
my #AoA = (
[1,2,3],
[4,5,6],
[7,8,9]
);
An array of hashrefs:
my #AoH = (
{ given_name => 'Ford', surname => 'Prefect' },
{ given_name => 'Arthur', surname => 'Dent' }
);
A hashref assigned to a scalar variable:
my $bones = {
head => 'skull',
jaw => 'mandible',
'great toe' => 'distal phalanx'
};

Related

Difference between a direct perl hash reference and a hash that is turned into a reference

I am trying to understand an example of code given here: https://www.perlmonks.org/?node_id=1083257 and the difference between the directly created hash references given in the example and one that I alternatively create first as a hash. When I run the following code:
use strict;
use warnings;
use Algorithm::NaiveBayes;
my $positive = {
remit => 2,
due => 4,
within => 1,
};
my $negative = {
assigned => 3,
secured => 1,
};
my $categorizer = Algorithm::NaiveBayes->new;
$categorizer->add_instance(
attributes => $positive,
label => 'positive');
$categorizer->add_instance(
attributes => $negative,
label => 'negative');
$categorizer->train;
my $sentence1 = {
due => 2,
remit => 1,
};
my $probability = $categorizer->predict(attributes => $sentence1);
print "Probability positive: $probability->{'positive'}\n";
print "Probability negative: $probability->{'negative'}\n";
I get the result:
Probability positive: 0.999500937781821
Probability negative: 0.0315891654410057
However, when I try to create the hash reference in the following way:
my %sentence1 = {
"due", 2,
"remit", 1
};
my $probability = $categorizer->predict(attributes => \%sentence1);
I get:
Reference found where even-sized list expected at simple_NaiveBayes.pl line 57.
Probability positive: 0.707106781186547
Probability negative: 0.707106781186547
Why is my hash \%sentence1 different from the $sentence1 hash reference given in the example?

Just like the warning says, you are assigning a reference where an even-sized list was expected. This
my %sentence1 = { # curly bracers create a hash reference, a scalar value
"due", 2,
"remit", 1
};
Should look like this
my %sentence1 = ( # parentheses used to return a list (*)
"due", 2,
"remit", 1
);
(*): Parentheses do not actually create a list, they just override precedence. In this case of = over , or =>, which returns a list of items to the assignment.
Or this
my $sentence1 = { # scalar variable is fine for a reference
"due", 2,
"remit", 1
};
If you want to get technical, what is happening in your hash assignment is that the hash reference { ... } is stringified and turned into a hash key. (The string will be something like HASH(0x104a7d0)) The value of that hash key will be undef, because the "list" you assigned to the hash only contained 1 thing: The hash reference. (Hence the "even-sized list" warning) So if you print such a hash with Data::Dumper, you would get something that looks like this:
$VAR1 = {
'HASH(0x104a7d0)' => undef
};

my %sentence1 = {
"due", 2,
"remit", 1
};
You did this wrong (you tried to create a hash with one key, which is a hashref (which doesn't work) and no corresponding value). That's why perl gave you a warning about finding a reference instead of an even-sized list. You wanted
my %sentence1 = (
"due", 2,
"remit", 1
);

Loosing some values in perl script

I have several hashes. Below have value
$data->{reports}->{$port}->{tb}->{tb});
$data->{reports}->{$port}->{tb}->{change});
$data->{reports}->{$port}->{tb_pbd}->{tb_pbd});
The values are:
$VAR1 = {
'4|EXPENSES|Net Income' => '8658617.49'
};
$VAR1 = {
'4|EXPENSES|Net Income' => '8728605.17'
};
$VAR1 = {
'4|EXPENSES|Net Income' => '-69987.68'
};
separate variable has value:
$keyee
value = 1|ASSETS|11240-000
However, when I put this value to all hash, for example:
$data->{reports}->{$port}->{tb}->{tb}->{$keyee}
It became undef. Any idea why and how to make treatment?

A hash key can hold only one value. To store multiple values associated with a single key , you need to use hash of array.
Code example :
my %myhash;
$myhash{'tb'}="val1";
$myhash{'tb'} ="vale2";
In the above example tb key will have only one value.
Now If you want to get both the values associated to the tb key use hash of array .
code example
push(#{$myhash},"val1");
push(#{$myhash},"val2");
Hope the above concept will help you.
For more information about hash of array visit : http://docstore.mik.ua/orelly/perl2/prog/ch09_02.htm

What is the best way to store 1key - 3 value in Perl?

I have a situation where I have 3 different values for each key. I have to print the data like this:
K1 V1 V2 V3
K2 V1 V2 V3
…
Kn V1 V2 V3
Is there any alternate efficient & easier way to achieve this other that that listed below? I am thinking of 2 approaches:
Maintain 3 hashes for 3 different values for each key.
Iterate through one hash based on the key and get the values from other 2 hashes
and print it.
Hash 1 - K1-->V1 ...
Hash 2 - K1-->V2 ...
Hash 3 - K1-->V3 ...
Maintain a single hash with key to reference to array of values.
Here I need to iterate and read only 1 hash.
K1 --> Ref{V1,V2,V3}
EDIT1:
The main challenge is that, the values V1, V2, V3 are derived at different places and cannot be pushed together as the array. So if I make the hash value as a reference to array, I have to dereference it every time I want to add the next value.
E.g., I am in subroutine1 - I populated Hash1 - K1-->[V1]
I am in subroutine2 - I have to de-reference [V1], then push V2. So now the hash
becomes K1-->[V1 V2], V3 is added in another routine. K1-->[V1 V2 V3]
EDIT2:
Now I am facing another challenge. I have to sort the hash based on the V3.
Still is it feasible to store the hash with key and list reference?
K1-->[V1 V2 V3]

It really depends on what you want to do with your data, although I can't imagine your option 1 being convenient for anything.
Use a hash of arrays if you are happy referring to your V1, V2, V3 using indexes 0, 1, 2 or if you never really want to handle their values separately.
my %data;
$data{K1}[0] = V1;
$data{K1}[1] = V2;
$data{K1}[2] = V3;
or, of course
$data{K1} = [V1, V2, V3];
As an additional option, if your values mean something nameable you could use a hash of hashes, so
my %data;
$data{K1}{name} = V1;
$data{K1}{age} = V2;
$data{K1}{height} = V3;
or
$data{K1}{qw/ name age height /} = (V1, V2, V3);
Finally, if you never need access to the individual values, it would be fine to leave them as they are in the file, like this
my %data;
$data{K1} = "V1 V2 V3";
But as I said, the internal storage is mostly dependent on how you want to access your data, and you haven't told us about that.
Edit
Now that you say
The main challenge is that, the values V1, V2, V3 are derived at
different places and cannot be pushed together as the array
I think perhaps the hash of hashes is more appropriate, but I wouldn't worry at all about dereferencing as it is an insignificant operation as far as execution time is concerned. But I wouldn't use push as that restricts you to adding the data in the correct order.
Depending which you prefer, you have the alternatives of
$data{K1}[2] = V3;
or
$data{K1}{height} = V3;
and clearly the latter is more readable.
Edit 2
As requested, to sort a hash of hashes by the third value (height in my example) you would write
use strict;
use warnings;
my %data = (
K1 => { name => 'ABC', age => 99, height => 64 },
K2 => { name => 'DEF', age => 12, height => 32 },
K3 => { name => 'GHI', age => 56, height => 9 },
);
for (sort { $data{$a}{height} <=> $data{$b}{height} } keys %data) {
printf "%s => %s %s %s\n", $_, #{$data{$_}}{qw/ name age height / };
}
or, if the data was stored as a hash of arrays
use strict;
use warnings;
my %data = (
K1 => [ 'ABC', 99, 64 ],
K2 => [ 'DEF', 12, 32 ],
K3 => [ 'GHI', 56, 9 ],
);
for (sort { $data{$a}[2] <=> $data{$b}[2] } keys %data) {
printf "%s => %s %s %s\n", $_, #{$data{$_}};
}
The output for both scripts is identical
K3 => GHI 56 9
K2 => DEF 12 32
K1 => ABC 99 64

In terms of readability/maintainability the second seems superior to me. The danger with the first is that you could end up with keys present in one hash but not the others. Also, if I came across the first approach, I'd have to think about it for a while, whereas the first seems "natural" (or a more common idiom, or more practical, or something else which means I'd understand it more readily).

The second approach (one array reference for each key) is:
In my experience, far more common,
Easier to maintain, since you only have one data structure floating around instead of three, and
More in line with the DRY principle: "Every piece of knowledge must have a single, unambiguous, authoritative representation within a system." Represent a key once, not three times.

Sure, it's better to mantain only one data structure:
%data = ( K1=>[V1, V2, V3], ... );
You can use Data::Dump for a fast view/debug of your data structure.

The choice really depends on the usage pattern. Specifically, it depends on whether you use procedural program or object-oriented programming.
This is a philosophical difference, and it's unrelated to whether language-level classes and objects are used or not. Procedural programming is organised around work flow; procedures accesses and transforms whatever data it needs. OOP is organised around records of data; methods access and transform one particular record only.
The second approach is closely aligned with object-oriented programming. Object-oriented programming is by far the most common programming style in Perl, so the second approach is almost universally the preferred structure these days (even though it takes more memory).
But your edit implied you might be using a more a procedural approach. As you discovered, the first approach is more convenient for procedural programming. It was very commonly used when procedural programming was in vogue.
Take whatever suits your code's organisation best.

How can I substitute an array for macro like keyword in Perl?

I am defining a lot of arrays of structs in a module. e.g.
my $array = [
{
Field1 => "FieldValue1"
},
{
#etc...
},
];
my $array2 = [
{
Field1 => "FieldValue1"
},
{
#etc...
},
];
I often repeat sequences of structs. For instance I might have five { Field1 => "FieldValue1" } structs in a row. Is it possible to save the sequence of structs in some data structure and insert that into my arrays?
e.g.
my $array3 = [ $Field1, $Field1, $Field1 ]; # $Field1 is a sequence of structs

You can do that but they will all wind up copies of each other. So editing the first one will change all of them. Instead use map.
my $array3 = [ map {Field1 => "FieldValue1"}, 1..5 ];

Any time that you find yourself repeating boilerplate code, Perl usually has a way around it.
I am not entirely clear what you want to do, but you could do something like this:
sub make_struct {
{Field1 => "FieldValue1"}
}
my $array = [map make_struct, 1 .. 10]; # array with 10 hashes
sub make_struct_array {[map make_struct, 1 .. $_[0]]}
my $array2 = make_struct_array 20; # array with 20 hashes
So in other words, write a subroutine that returns a new data structure for you. The subroutine can take a variety of options if you need to customize the structure.

The answers above work well for their own purposes, but they were not exactly what I wanted.
I ended up usin push() to create the arrays. $templatearray1 and $templatearray2 are arrays of structs. Push()'s behavior is to not insert the array reference. Instead it inserts the elements of the arrays.
e.g.
my $myarray = [];
push(#$myarray, #$templatearray1);
push(#$myarray, #$templatearray2);
push(#$myarray, #$templatearray1);
push(#$myarray, #$templatearray2);
push(#$myarray, #$templatearray1);
push(#$myarray, #$templatearray2);
push(#$myarray, (
{
key1 => 'blah1',
key2 => 'blah2',
},
));

Modifying hash within a hash in Perl

What is the shortest amount of code to modify a hash within a hash in the following instances:
%hash{'a'} = { 1 => one,
2 => two };
(1) Add a new key to the inner hash of 'a' (ex: c => 4 in the inner hash of 'a')
(2) Changing a value in the inner hash (ex: change the value of 1 to 'ONE')

Based on the question, you seem new to perl, so you should look at perldoc perlop among others.
Your %hash keys contain scalar values that are hashrefs. You can dereference using the -> operator, eg, $hashref = {foo=>42}; $hashref->{foo}. Similarly you can do the same with the values in the hash: $hash{a}->{1}. When you chain the indexes, though, there's some syntactic sugar for an implicit -> between them, so you can just do $hash{a}{1} = 'ONE' and so on.
This question probably also will give you some useful leads.

$hash{a}{c} = 4;
$hash{a}{1} = "ONE";

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Perl array of hash structures - perl

Related

Difference between a direct perl hash reference and a hash that is turned into a reference

Loosing some values in perl script

What is the best way to store 1key - 3 value in Perl?

How can I substitute an array for macro like keyword in Perl?

Modifying hash within a hash in Perl

Categories

Resources