How to sort output from Data::Printer? - perl

I'm attempting to sort output from Data::Printer and having no luck.
I would like to sort numerically by value, instead of alphabetically by key (which is default).
inspired by How do you sort the output of Data::Dumper? I'm guessing that Data::Printer's sort_methods works similarly to Data::Dumper's Sortkeys:
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use autodie ':default';
use DDP {output => 'STDOUT', show_memsize => 1};
my %h = (
'a' => 0,
'b' => 7,
'c' => 5
);
p %h, sort_methods => sub { sort {$_[0]->{$b} <=> $_[0]->{$a}} keys %{$_[0]} };
but this prints out
{
a 0,
b 7,
c 5
} (425B)
but the order should be b, c, and then a.
Curiously, there is no error message.
How can I sort the output of Data::Printer numerically by hash value?

You're not dumping an object, so sort_methods doesn't apply. And if it did, "this option will order them alphabetically".
There is a sort_keys option for hashes, but it determines "Whether to sort keys when printing the contents of a hash". It defaults to 1, and there's no mention of a means to set the order. A test confirms that providing a sub doesn't provide a means to provide a sort order.
$ perl -e'use DDP; p {a=>5}->%*, sort_keys => sub { };'
[Data::Printer] 'sort_keys' property must be a scalar, not a reference to CODE at -e line 1.

Related

Is it possible to push a key-value pair directly to hash in perl?

I know pushing is only passible to array, not hash. But it would be much more convenient to allow pushing key-value pair directly to hash (and I am still surprise it is not possible in perl). I have an example:
#!/usr/bin/perl -w
#superior words begin first, example of that word follow
my #ar = qw[Animals,dog Money,pound Jobs,doctor Food];
my %hash;
my $bool = 1;
sub marine{
my $ar = shift if $bool;
for(#$ar){
my #ar2 = split /,/, $_;
push %hash, ($ar2[0] => $ar2[1]);
}
}
marine(\#ar);
print "$_\n" for keys %hash;
Here I have an array, which has 2 words separately by , comma. I would like to make a hash from it, making the first a key, and the second a value (and if it lacks the value, as does the last Food word, then no value at all -> simply undef. How to make it in perl?
Output:
Possible attempt to separate words with commas at ./a line 4.
Experimental push on scalar is now forbidden at ./a line 12, near ");"
Execution of ./a aborted due to compilation errors.
I might be oversimplyfing things here, but why not simply assign to the hash rather than trying to push into it?
That is, replace this unsupported expression:
push %hash, ($ar2[0] => $ar2[1]);
With:
$hash{$ar2[0]} = $ar2[1];
If I incoporate this in your code, and then dump the resulting hash at the end, I get:
$VAR1 = {
'Food' => undef,
'Money' => 'pound',
'Animals' => 'dog',
'Jobs' => 'doctor'
};
Split inside map and assign directly to a hash like so:
my #ar = qw[Animals,dog Money,pound Jobs,doctor Food];
my %hash_new = map {
my #a = split /,/, $_, 2;
#a == 2 ? #a : (#a, undef)
} #ar;
Note that this can also handle the case with more than one comma delimiter (hence splitting into a max of 2 elements). This can also handle the case with no commas, such as Food - in this case, the list with the single element plus the undef is returned.
If you need to push multiple key/value pairs to (another) hash, or merge hashes, you can assign a list of hashes like so:
%hash = (%hash_old, %hash_new);
Note that the same keys in the old hash will be overwritten by the new hash.
We can assign this array to a hash and perl will automatically look at the values in the array as if they were key-value pairs. The odd elements (first, third, fifth) will become the keys and the even elements (second, fourth, sixth) will become the corresponding values. check url https://perlmaven.com/creating-hash-from-an-array
use strict;
use warnings;
use Data::Dumper qw(Dumper);
my #ar;
my %hash;
#The code in the enclosing block has warnings enabled,
#but the inner block has disabled (misc and qw) related warnings.
{
#You specified an odd number of elements to initialize a hash, which is odd,
#because hashes come in key/value pairs.
no warnings 'misc';
#If your code has use warnings turned on, as it should, then you'll get a warning about
#Possible attempt to separate words with commas
no warnings 'qw';
#ar = qw[Animals,dog Money,pound Jobs,doctor Food];
# join the content of array with comma => Animals,dog,Money,pound,Jobs,doctor,Food
# split the content using comma and assign to hash
# split function returns the list in list context, or the size of the list in scalar context.
%hash = split(",", (join(",", #ar)));
}
print Dumper(\%hash);
Output
$VAR1 = {
'Animals' => 'dog',
'Money' => 'pound',
'Jobs' => 'doctor',
'Food' => undef
};

Add Getopt::Long options in a hash, even when using a repeat specifier

Perl's Getopt::Long allows a developer to add their own options to a script. It's also possible to allow multiple values for an option by the use of a repeat specifier, as seen in regular expressions. For example:
GetOptions('coordinates=f{2}' => \#coor, 'rgbcolor=i{3}' => \#color);
Furthermore, option values can be stored in a hash, like so:
my %h = ();
GetOptions(\%h, 'length=i'); # will store in $h{length}
What I'm trying to do is, combine these two methods to end up with a hash of my options, even when they have multiple values.
As an example, say I want to allow three options: birthday (three integers), parents (one or two strings), first name (exactly one string).
Let's also say that I want to put these values into a hash. I tried the following:
use strict;
use warnings;
use Getopt::Long;
use Data::Dumper;
my %h = ();
GetOptions(\%h, 'bday=i{3}', 'parents=s{1,2}', 'name=s{1}');
print Dumper(\%h);
And tested it, but the output was as follows:
perl optstest.pl --bday 22 3 1986 --parents john mary --name ellen
$VAR1 = {
'name' => 'ellen',
'parents' => 'mary',
'bday' => 1986
};
Only the last value of each option is actually used in the hash. What I would like, though, is:
$VAR1 = {
'name' => 'ellen',
'parents' => ['mary', 'john'],
'bday' => [22, 3, 1986]
};
If 'ellen' would be in an array, or if everything was inside a hash, that'd be fine as well.
Is it not possible to combine these two functionalities of Getopt::Long, i.e. putting options in a hash and using repeat specifiers?
use Getopt::Long;
# enable for debugging purposes
# Getopt::Long::Configure("debug");
use Data::Dumper;
my %h = ();
GetOptions(\%h, 'bday=i{3}', 'parents=s#{1,2}', 'name=s#{1}');
print Dumper(\%h);
Is that what you want?
$VAR1 = {
'bday' => 1986,
'name' => [
'ellen'
],
'parents' => [
'john',
'mary'
]
};
If you want an array, you need to give it a reference to an array.
local #ARGV = qw( --x y z );
my %h = ( x => [] );
GetOptions(\%h, 'x=s{2}');
print(Dumper(\%h));
Or you need to specify that you want an array.
local #ARGV = qw( --x y z );
GetOptions(\my %h, 'x=s#{2}');
print(Dumper(\%h));
Output:
$VAR1 = {
'x' => [
'y',
'z'
]
};
The Options with multiple values section of the documentation that you link to also says this
Warning: What follows is an experimental feature.
It says earlier on
GetOptions (\%h, 'colours=s#'); # will push to #{$h{colours}}
so I guess that it was the author's intent for it to work the same way with repeat specifiers, and that you have found a bug
I suggest that you report it to the Perl 5 Porters using the perlbug utility that is part of the Perl installation

What does the default sorting in Perl sort by on an array?

I am reading some Perl code that sorts an array of hashes
uniq sort #{$ProjectData{$project}{Packages}}
I am not clear on whether this is sorting by the key of the hash, the key+value of the hash, or the memory address of the hash. Also, I'm not clear on whether uniq is stable or not. I am going to write my own compare function, but it seems or work as is, so I'd appreciate if someone could clear up what is happening currently.
If $ProjectData{$project}{Packages} is a reference to an array of hash references, then Perl will stringify all those references to something like HASH(0xf1f6d290) and sort them as strings
Effectively it's sorting by memory address, but that's really not very useful and you may as well drop the sort, leaving you with
uniq #{ $ProjectData{$project}{Packages} }
If you're using List::Util::uniq then it's stable, but unless you sort the data by something a bit more useful it really doesn't matter
As for sort #{$ProjectData{$project}{Packages}}, that's just sorting the contents of the array #{$ProjectData{$project}{Packages}} using the default sorting.
According to the docs it uses "standard string comparison order" which means it uses cmp. It means it sorts by turning the arguments into strings and comparing them one character at a time in the order they appear in their character encoding. We used to call this "ASCIIbetical" but now with UTF-8 it's not necessarily ASCII.
Some examples.
# 1, 20, 3 because it considers them "1", "20", "3".
sort 1, 3, 20;
# a, z
sort "z", "a";
# Z, a because ASCII Z is 90 and ASCII a is 97.
sort "Z", "a";
# z, ä because UTF-8 z is 122 and UTF-8 ä is 195 164
use utf8;
sort "z", "ä";
sort sorts the scalars passed to in on the stack[1]. In your case, the scalars placed on the stack are the elements of the array referenced by $ProjectData{$project}{Packages}.
The default compare function sorts the values in ascending lexical order.
sort LIST
is short for
sort { $a cmp $b } LIST
This means the elements to compare are stringified, and compared character-by-character. This sorts abc before def, and 123 before 13.
But you're comparing the stringification of references. The following are examples of stringification of references:
$ perl -E'say {}'
HASH(0x1a36ff8)
$ perl -E'say bless({}, "SomeClass");'
SomeClass=HASH(0x41d0a28)
As you can surmise, the only thing you can count on sort doing when given references is placing identical references together. It won't place references to identical hashes together, only references to the same hash.
uniq from List::MoreUtils is stable.
The order of elements in the returned list is the same as in LIST.
It also uses a lexical comparison of the of the scalars. This means it will only remove references to the same hash, not references to identical hashes.
$ perl -MData::Dumper=Dumper -MList::MoreUtils=uniq -e'
my $x = { a => 234, b => 345 };
my $y = $x;
my $z = { a => 234, b => 345 };
print(Dumper([ uniq $x, $y, $z ]));
'
$VAR1 = [
{
'b' => 345,
'a' => 234
},
{
'b' => 345,
'a' => 234
}
];
Tip: Unless you're using a method of removing duplicates that requires sorted input, it makes more sense to remove the duplicates before sorting the values.
sort +uniq #{$ProjectData{$project}{Packages}}
(The + prevents uniq from being used as sort's compare function.)
But since all that sort will meaningfully accomplish when given references is placing identical references together, and since uniq already eliminated identical references, the above can be simplified to
uniq #{$ProjectData{$project}{Packages}}
It's far more likely that you want to filter out references to duplicate hashes rather than references to the same hash. Here's how you can do that:
use JSON::XS qw( );
my $json_serializer = JSON::XS->new->canonical;
my %seen;
my #uniques = grep !$seen{ $json_serializer->encode($_) }++, #references;
For example,
$ perl -MData::Dumper=Dumper -MJSON::XS -e'
my $json_serializer = JSON::XS->new->canonical;
my $x = { a => 234, b => 345 };
my $y = $x;
my $z = { a => 234, b => 345 };
my %seen;
print(Dumper([ grep !$seen{ $json_serializer->encode($_) }++, $x, $y, $z ]));
'
$VAR1 = [
{
'b' => 345,
'a' => 234
}
];
Ok, technically, it can actually be passed an array as an optimization, but that's internal details.

Description of Perl hash syntax in use statements

In Perl the following is allowed
use constant MY_CONSTANT => 1
however this does not match the documentation of "use" which states that it can take a list. The above is however not a list in the normal way as shown by the following command.
perl -e 'use strict; my #l = "test" => 1; print "#l\n"
This will print "test" and not "test 1".
So is this some special list syntax that can be used together with the use statement or is it also allowed in other cases?
MY_CONSTANT => 1 isn't "a hash".
The => is essentially just a comma, with the additional property that a “bareword” on the left side will be autoquoted: foo => 42 is exactly the same as 'foo', 42. Therefore we can do silly stuff like foo => bar => baz => 42. The “fat comma” should be used to indicate a relation between the left and the right value, e.g. between a hash key and value.
LIST in use Module LIST doesn't mean you need to use the list operator
LIST simply refers to an arbitrary expression that will be evaluated in list context, so not only does list operator MY_CONSTANT => 1 match the specified syntax, but so would the following:
sub f { MY_CONSTANT => 1 }
use constant f();
Be wary of precedence
The next problem you're running into is that the = operator has higher precedence than ,:
my #array = 1, 2, 3;
parses as
(my #array = 1), 2, 3;
As => is the same as ,, the line my #array = test => 1; will parse as
(my #array = "test"), 1;
Use parens to indicate the correct precedence:
my #array = (test => 1);
which will produce your expected output.

How can I access a Getopt::Long option's value in the option's sub?

My goal is to have a --override=f option that manipulates the values of two other options. The trick is figuring out how to refer to the option's value (the part matching the f in the =f designator) in the sub that's executed when GetOptions detects the presence of the option on the command line.
Here is how I'm doing it:
$ cat t.pl
#!/usr/bin/perl
use strict;
use warnings;
use Getopt::Long;
our %Opt = (
a => 0,
b => 0,
);
our %Options = (
"a=f" => \$Opt{a},
"b=f" => \$Opt{b},
"override=f" => sub { $Opt{$_} = $_[1] for qw(a b); }, # $_[1] is the "trick"
);
GetOptions(%Options) or die "whatever";
print "\$Opt{$_}='$Opt{$_}'\n" for keys %Opt;
$ t.pl --override=5
$Opt{a}='5'
$Opt{b}='5'
$ t.pl --a=1 --b=2 --override=5 --a=3
$Opt{a}='3'
$Opt{b}='5'
The code appears to handle options and overrides just like I want. I have discovered that within the sub, $_[0] contains the name of the option (the full name, even if it's abbreviated on the command line), and $_[1] contains the value. Magic.
I haven't seen this documented, so I'm concerned about whether I'm unwittingly making any mistakes using this technique.
From the fine manual:
When GetOptions() encounters the option, it will call the subroutine with two or three arguments. The first argument is the name of the option. (Actually, it is an object that stringifies to the name of the option.) For a scalar or array destination, the second argument is the value to be stored. For a hash destination, the second arguments is the key to the hash, and the third argument the value to be stored.
So, the behavior that you're seeing is documented and you should be safe with it.