What is the best way to store 1key - 3 value in Perl? - perl

I have a situation where I have 3 different values for each key. I have to print the data like this:
K1 V1 V2 V3
K2 V1 V2 V3
…
Kn V1 V2 V3
Is there any alternate efficient & easier way to achieve this other that that listed below? I am thinking of 2 approaches:
Maintain 3 hashes for 3 different values for each key.
Iterate through one hash based on the key and get the values from other 2 hashes
and print it.
Hash 1 - K1-->V1 ...
Hash 2 - K1-->V2 ...
Hash 3 - K1-->V3 ...
Maintain a single hash with key to reference to array of values.
Here I need to iterate and read only 1 hash.
K1 --> Ref{V1,V2,V3}
EDIT1:
The main challenge is that, the values V1, V2, V3 are derived at different places and cannot be pushed together as the array. So if I make the hash value as a reference to array, I have to dereference it every time I want to add the next value.
E.g., I am in subroutine1 - I populated Hash1 - K1-->[V1]
I am in subroutine2 - I have to de-reference [V1], then push V2. So now the hash
becomes K1-->[V1 V2], V3 is added in another routine. K1-->[V1 V2 V3]
EDIT2:
Now I am facing another challenge. I have to sort the hash based on the V3.
Still is it feasible to store the hash with key and list reference?
K1-->[V1 V2 V3]

It really depends on what you want to do with your data, although I can't imagine your option 1 being convenient for anything.
Use a hash of arrays if you are happy referring to your V1, V2, V3 using indexes 0, 1, 2 or if you never really want to handle their values separately.
my %data;
$data{K1}[0] = V1;
$data{K1}[1] = V2;
$data{K1}[2] = V3;
or, of course
$data{K1} = [V1, V2, V3];
As an additional option, if your values mean something nameable you could use a hash of hashes, so
my %data;
$data{K1}{name} = V1;
$data{K1}{age} = V2;
$data{K1}{height} = V3;
or
$data{K1}{qw/ name age height /} = (V1, V2, V3);
Finally, if you never need access to the individual values, it would be fine to leave them as they are in the file, like this
my %data;
$data{K1} = "V1 V2 V3";
But as I said, the internal storage is mostly dependent on how you want to access your data, and you haven't told us about that.
Edit
Now that you say
The main challenge is that, the values V1, V2, V3 are derived at
different places and cannot be pushed together as the array
I think perhaps the hash of hashes is more appropriate, but I wouldn't worry at all about dereferencing as it is an insignificant operation as far as execution time is concerned. But I wouldn't use push as that restricts you to adding the data in the correct order.
Depending which you prefer, you have the alternatives of
$data{K1}[2] = V3;
or
$data{K1}{height} = V3;
and clearly the latter is more readable.
Edit 2
As requested, to sort a hash of hashes by the third value (height in my example) you would write
use strict;
use warnings;
my %data = (
K1 => { name => 'ABC', age => 99, height => 64 },
K2 => { name => 'DEF', age => 12, height => 32 },
K3 => { name => 'GHI', age => 56, height => 9 },
);
for (sort { $data{$a}{height} <=> $data{$b}{height} } keys %data) {
printf "%s => %s %s %s\n", $_, #{$data{$_}}{qw/ name age height / };
}
or, if the data was stored as a hash of arrays
use strict;
use warnings;
my %data = (
K1 => [ 'ABC', 99, 64 ],
K2 => [ 'DEF', 12, 32 ],
K3 => [ 'GHI', 56, 9 ],
);
for (sort { $data{$a}[2] <=> $data{$b}[2] } keys %data) {
printf "%s => %s %s %s\n", $_, #{$data{$_}};
}
The output for both scripts is identical
K3 => GHI 56 9
K2 => DEF 12 32
K1 => ABC 99 64

In terms of readability/maintainability the second seems superior to me. The danger with the first is that you could end up with keys present in one hash but not the others. Also, if I came across the first approach, I'd have to think about it for a while, whereas the first seems "natural" (or a more common idiom, or more practical, or something else which means I'd understand it more readily).

The second approach (one array reference for each key) is:
In my experience, far more common,
Easier to maintain, since you only have one data structure floating around instead of three, and
More in line with the DRY principle: "Every piece of knowledge must have a single, unambiguous, authoritative representation within a system." Represent a key once, not three times.

Sure, it's better to mantain only one data structure:
%data = ( K1=>[V1, V2, V3], ... );
You can use Data::Dump for a fast view/debug of your data structure.

The choice really depends on the usage pattern. Specifically, it depends on whether you use procedural program or object-oriented programming.
This is a philosophical difference, and it's unrelated to whether language-level classes and objects are used or not. Procedural programming is organised around work flow; procedures accesses and transforms whatever data it needs. OOP is organised around records of data; methods access and transform one particular record only.
The second approach is closely aligned with object-oriented programming. Object-oriented programming is by far the most common programming style in Perl, so the second approach is almost universally the preferred structure these days (even though it takes more memory).
But your edit implied you might be using a more a procedural approach. As you discovered, the first approach is more convenient for procedural programming. It was very commonly used when procedural programming was in vogue.
Take whatever suits your code's organisation best.

Related

Raku: Trouble Accessing Value of a Multidimensional Hash

I am having issues accessing the value of a 2-dimensional hash. From what I can tell online, it should be something like: %myHash{"key1"}{"key2"} #Returns value
However, I am getting the error: "Type Array does not support associative indexing."
Here's a Minimal Reproducible Example.
my %hash = key1-dim1 => key1-dim2 => 42, key2-dim1 => [42, 42];
say %hash{'key1-dim1'}{'key1-dim2'}; # 42
say %hash{'key2-dim1'}{'foo bar'}; # Type Array does not support associative indexing.
Here's another reproducible example, but longer:
my #tracks = 'Foo Bar', 'Foo Baz';
my %count;
for #tracks -> $title {
$_ = $title;
my #words = split(/\s/, $_);
if (#words.elems > 1) {
my $i = 0;
while (#words.elems - $i > 1) {
my %wordHash = ();
%wordHash.push: (#words[$i + 1] => 1);
%counts.push: (#words[$i] => %wordHash);
say %counts{#words[$i]}{#words[$i+1]}; #===============CRASHES HERE================
say %counts.kv;
$i = $i + 1;
}
}
}
In my code above, the problem line where the 2-d hash value is accessed will work once in the first iteration of the for-loop. However, it always crashes with that error on the second time through. I've tried replacing the array references in the curly braces with static key values in case something was weird with those, but that did not affect the result. I can't seem to find what exactly is going wrong by searching online.
I'm very new to raku, so I apologize if it's something that should be obvious.
After adding the second elements with push to the same part of the Hash, the elment is now an array. Best you can see this by print the Hash before the crash:
say "counts: " ~ %counts.raku;
#first time: counts: {:aaa(${:aaa(1)})}
#second time: counts: {:aaa($[{:aaa(1)}, {:aaa(1)}])}
The square brackets are indicating an array.
Maybe BagHash does already some work for you. See also raku sets without borders
my #tracks = 'aa1 aa2 aa2 aa3', 'bb1 bb2', 'cc1';
for #tracks -> $title {
my $n = BagHash.new: $title.words;
$n.raku.say;
}
#("aa2"=>2,"aa1"=>1,"aa3"=>1).BagHash
#("bb1"=>1,"bb2"=>1).BagHash
#("cc1"=>1).BagHash
Let me first explain the minimal example:
my %hash = key1-dim1 => key1-dim2 => 42,
key2-dim1 => [42, 42];
say %hash{'key1-dim1'}{'key1-dim2'}; # 42
say %hash{'key2-dim1'}{'key2-dim2'}; # Type Array does not support associative indexing.
The problem is that the value associated with key2-dim1 isn't itself a hash but is instead an Array. Arrays (and all other Positionals) only support indexing by position -- by integer. They don't support indexing by association -- by string or object key.
Hopefully that explains that bit. See also a search of SO using the [raku] tag plus 'Type Array does not support associative indexing'.
Your longer example throws an error at this line -- not immediately, but eventually:
say %counts{...}{...}; # Type Array does not support associative indexing.
The hash %counts is constructed by the previous line:
%counts.push: ...
Excerpting the doc for Hash.push:
If a key already exists in the hash ... old and new value are both placed into an Array
Example:
my %h = a => 1;
%h.push: (a => 1); # a => [1,1]
Now consider that the following code would have the same effect as the example from the doc:
my %h;
say %h.push: (a => 1); # {a => 1}
say %h.push: (a => 1); # {a => [1,1]}
Note how the first .push of a => 1 results in a 1 value for the a key of the %h hash, while the second .push of the same pair results in a [1,1] value for the a key.
A similar thing is going on in your code.
In your code, you're pushing the value %wordHash into the #words[$i] key of the %counts hash.
The first time you do this the resulting value associated with the #words[$i] key in %counts is just the value you pushed -- %wordHash. This is just like the first push of 1 above resulting in the value associated with the a key, from the push, being 1.
And because %wordHash is itself a hash, you can associatively index into it. So %counts{...}{...} works.
But the second time you push a value to the same %counts key (i.e. when the key is %counts{#words[$i]}, with #words[$i] set to a word/string/key that is already held by %counts), then the value associated with that key will not end up being associated with %wordHash but instead with [%wordHash, %wordHash].
And you clearly do get such a second time in your code, if the #tracks you are feeding in have titles that begin with the same word. (I think the same is true even if the duplication isn't the first word but instead later ones. But I'm too confused by your code to be sure what the exact broken combinations are. And it's too late at night for me to try understand it, especially given that it doesn't seem important anyway.)
So when your code then evaluates %counts{#words[$i]}{#words[$i+1]}, it is the same as [%wordHash, %wordHash]{...}. Which doesn't make sense, so you get the error you see.
Hopefully the foregoing has been helpful.
But I must say I'm both confused by your code, and intrigued as to what you're actually trying to accomplish.
I get that you're just learning Raku, and that what you've gotten from this SO might already be enough for you, but Raku has a range of nice high level hash like data types and functionality, and if you describe what you're aiming at we might be able to help with more than just clearing up Raku wrinkles that you and we have been dealing with thus far.
Regardless, welcome to SO and Raku. :)
Well, this one was kind of funny and surprising. You can't go wrong if you follow the other question, however, here's a modified version of your program:
my #tracks = ['love is love','love is in the air', 'love love love'];
my %counts;
for #tracks -> $title {
$_ = $title;
my #words = split(/\s/, $_);
if (#words.elems > 1) {
my $i = 0;
while (#words.elems - $i > 1) {
my %wordHash = ();
%wordHash{#words[$i + 1]} = 1;
%counts{#words[$i]} = %wordHash;
say %counts{#words[$i]}{#words[$i+1]}; # The buck stops here
say %counts.kv;
$i = $i + 1;
}
}
}
Please check the line where it crashed before. Can you spot the difference? It was kind of a (un)lucky thing that you used i as a loop variable... i is a complex number in Raku. So it was crashing because it couldn't use complex numbers to index an array. You simply had dropped the $.
You can use sigilless variables in Raku, as long as they're not i, or e, or any of the other constants that are already defined.
I've also made a couple of changes to better reflect the fact that you're building a Hash and not an array of Pairs, as Lukas Valle said.

Is it guaranteed anywhere in docs that hashes with same keys will also have same order?

There's not much guarantees about order of hash keys in Perl. Is there any mention in docs that I can't find that would say that as long as two hashes use exactly same keys, they will go in exactly same order?
Short test seems to confirm that. Even if I generate some additional keys for internal key table between assigning to two different hashes, their keys are returned in same order:
my %aaa;
my %bbb;
my %ccc;
my %ddd;
#aaa{qw(a b c d e f g h i j k l)}=();
# Let's see if generating more keys for internal table matters
#ccc{qw(m n o p q r s t u v w x)}=();
#bbb{qw(a b c d e f g h i j k l)}=();
# Just to test if different insertion order matters
#ddd{qw(l k c d e f g h i j a)}=(); $ddd{b} = ();
print keys %aaa, "\n";
print keys %bbb, "\n";
print keys %ddd, "\n";
However I wouldn't rely on udocumented behavior and only fact that can be easily found in docs is that keys, values and each all will use same order as long as hash is not modified.
From perlsec:
Perl has never guaranteed any ordering of the hash keys, and the
ordering has already changed several times during the lifetime of Perl
5. Also, the ordering of hash keys has always been, and continues to be, affected by the insertion order.
http://perldoc.perl.org/perlsec.html
A longer test disproves.
So, different hashes with the same set of keys won't always have the same order. For me the program below demonstrates that two hashes with keys qw(a b c d e f) can differ in ordering:
v5.16.0
%h1: ecabdf
%h2: eadcbf
Program:
#!/usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
# http://stackoverflow.com/q/12724071/132382
use constant KEYS => qw(a b c d e f);
my %h1 = map { $_ => undef } KEYS;
my %h2 = map { $_ => undef } KEYS;
delete #h2{'b', 'd', 'f'};
#h2{'x', 'y', 'z'} = ();
#h2{'b', 'd', 'f'} = ();
delete #h2{'x', 'y', 'z'};
say $^V;
say '%h1: ', keys(%h1);
say '%h2: ', keys(%h2);
Update
Here's a simpler demonstration that insertion order alone matters:
$ perl -MList::Util=shuffle -E \
> '#keys = ('a'..'z'); #h1{#keys} = #h2{shuffle #keys} = ();
> say keys(%$_) for (\%h1, \%h2)'
wraxdjyukhgftienvmslcpqbzo
warxdjyukhgftienmvslpcqbzo
#^^ ^^ ^^
#|| || ||
It is specifically guaranteed that this is undependable.
See the Algorithmic Complexity Attacks section of perlsec in full. Despite its regrettable incoherency, it states that
in 5.8.1, the order is guaranteed to be random every time.
in 5.8.2 and later, the order will be the same unless Perl detects pathological behavior (specifically, a series of keys that would all hash to a small number of buckets, causing the hash performance to suffer). In those cases, "the function is perturbed by a pseudorandom seed".
The documentation does not guarantee that the ordering will always be the same; in fact, it specifically states that it will not be predictable in a pathological case. Should the hashing function be changed in a future release, it's possible that data which previously did not generate degenerate hashing would now do so, and then would be subject to the random perturbation.
So the takeaway is that if you're not using 5.8.1, maybe you'll get the same order, and maybe it won't change when you update your Perl, but it might. If you are using 5.8.1, then a random order is guaranteed.
If you want a dependable order, use one of the CPAN classes that provides a hash that has a guaranteed key order - Tie::Hash::Indexed, Tie::IxHash - or just sort your keys. If you have a hash that has less than a few thousand keys, you probably won't notice an appreciable difference. If it has more than that, maybe you should consider a heavier-weight solution such as a database anyway.
Edit: and just to make it more interesting, keys will be randomly ordered as of 5.18.
Here is a counter-example that is shorter than #pilcrow's (apparently I missed his answer when I first looked at this question):
#!/usr/bin/env perl
use strict; use warnings;
my #hashes = (
{ map { $_ => rand } 'a' .. 'z' },
{ map { $_ => rand } 'a' .. 'd', 'f' .. 'z' }
);
delete $hashes[0]{e};
print "#{[ keys %$_ ]}\n" for #hashes;
Output:
C:\temp> t
w r a x d j y u k h g f t i n v m s l c p q b z o
w r a x d j y u k h g f t i n v m s l c p b q z o
perldoc -f keys has some info on the ordering:
The keys of a hash are returned in an apparently random order. The actual random order is subject to change in future versions of Perl, but it is guaranteed to be the same order as either the values or each function produces (given that the hash has not been modified). Since Perl 5.8.1 the ordering can be different even between different runs of Perl for security reasons (see Algorithmic Complexity Attacks in perlsec).
So the only guarantee is that no ordering is guaranteed.
Since at least 5.18, following is explicitly mentioned in perldoc perlsec:
keys, values, and each return items in a per-hash randomized order.
Modifying a hash by insertion will change the iteration order of that
hash.
Perl has never guaranteed any ordering of the hash keys, and the
ordering has already changed several times during the lifetime of Perl
5. Also, the ordering of hash keys has always been, and continues to be, affected by the insertion order and the history of changes made to
the hash over its lifetime.
Therefore two hashes with same set of keys are explicitly NOT guaranteed to be iterated in same order.

returning multiple dissimilar data structures from R function in PL/R

I have been looking at various discussions here on SO and other places, and the general consensus seems that if one is returning multiple non-similar data structures from an R function, they are best returned as a list(a, b) and then accessed by the indexes 0 and 1 and so on. Except, when using an R function via PL/R inside a Perl program, the R list function flattens the list, and also stringifies even the numbers. For example
my $res = $sth->fetchrow_arrayref;
# now, $res is a single, flattened, stringified list
# even though the R function was supposed to return
# list([1, "foo", 3], [2, "bar"])
#
# instead, $res looks like c(\"1\", \""foo"\", \"3\", \"2\", \""bar"\")
# or some such nonsense
Using a data.frame doesn't work because the two arrays being returned are not symmetrical, and the function croaks.
So, how do I return a single data structure from an R function that is made up of an arbitrary set of nested data structures, and still be able to access each individual bundle from Perl as simply $res->[0], $res->[1] or $res->{'employees'}, $res->{'pets'}? update: I am looking for an R equiv of Perl's [[1, "foo", 3], [2, "bar"]] or even [[1, "foo", 3], {a => 2, b => "bar"}]
addendum: The main thrust of my question is how to return multiple dissimilar data structures from a PL/R function. However, the stringification, as noted above, and secondary, is also problematic because I convert the data to JSON, and all those extra quotes just add to useless data transferred between the server and the user.
I think you have a few problems here. The first is you can't just return an array in this case because it won't pass PostgreSQL's array checks (arrays must be symmetric, all of the same type, etc). Remember that if you are calling PL/R from PL/Perl across a query interface, PostgreSQL type constraints are going to be an issue.
You have a couple of options.
You could return setof text[], with one data type per row.
you could return some sort of structured data using structures PostgreSQL understands, like:
CREATE TYPE ab AS (
a text,
b text
);
CREATE TYPE r_retval AS (
labels text[],
my_ab ab
);
This would allow you to return something like:
{labels => [1, "foo", 3], ab => {a => 'foo', b => 'bar'} }
But at any rate you have to put it into a data structure that the PostgreSQL planner can understand and that is what I think is missing in your example.

Modifying hash within a hash in Perl

What is the shortest amount of code to modify a hash within a hash in the following instances:
%hash{'a'} = { 1 => one,
2 => two };
(1) Add a new key to the inner hash of 'a' (ex: c => 4 in the inner hash of 'a')
(2) Changing a value in the inner hash (ex: change the value of 1 to 'ONE')
Based on the question, you seem new to perl, so you should look at perldoc perlop among others.
Your %hash keys contain scalar values that are hashrefs. You can dereference using the -> operator, eg, $hashref = {foo=>42}; $hashref->{foo}. Similarly you can do the same with the values in the hash: $hash{a}->{1}. When you chain the indexes, though, there's some syntactic sugar for an implicit -> between them, so you can just do $hash{a}{1} = 'ONE' and so on.
This question probably also will give you some useful leads.
$hash{a}{c} = 4;
$hash{a}{1} = "ONE";

Should I use autobox in Perl?

For those unaware of Perl's autobox, it is a module that gives you methods on built in primitives, and lets you even override them.
# primitives
'a string'->toupper();
10->to(1); # returns [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
# Arrays, array refs
[qw(A B C D E)]->for_each( sub { ... } );
#array->length()
# Hashes, hash refs
{ key => 'value', key2 => 'value2' }->values()
%hash->keys()
# Even specify your own base class...
use autobox SCALAR => 'Foo';
It overall makes methods on built in types feel more like objects, simplifying some tasks and making others seem more obvious.
However...
the autobox docs say that there's performance penalties, some more than simply calling the method on the object, much more than the standard syntax. And then, there's a few caveats about its use in evals (specifically, string evals) that might, in some circumstances, cause issues. It also looks like it doesn't come standard with many Perl distros.
Is it ever really worth it to use autobox?
Well, did you ever wish there were a module that did what autobox does before you found out about autobox?
If the answer is 'yes', then you should use it. You might also want to contribute to its development by filing bug reports and fixing them if you get the opportunity.
Unfortunately, I fall into the camp of 'cool, but ...' so I cannot offer you any more insight.
Horses for courses! However reading a chain from left to right is often easier to grok IMHO:
say sort grep /\w/, map { chr } 0 .. 255;
While shorter below does flow nicer:
say [ 0..255 ]->map( sub { chr } )->grep( sub { m/\w/ } )->sort->join('');
ref: snippet from Hacker News comments
/I3az/
I use autobox for:
$c->login($c->req->{params}->hslice([qw/username password/])
This ends up taking an arbitrary hash and reduces it to { username => <whatever>, password => <whatever> }. Normally a lot of code. One symbol with Moose::Autobox.