Search object array for matching possible multiple values using different comparison operators - perl

I have a function to search an array of objects for a matching value using the eq operator, like so:
sub find {
my ( $self, %params ) = #_;
my #entries = #{ $self->{_entries} };
if ( $params{filename} ) {
#entries = grep { $_->filename eq $params{filename} } #entries;
}
if ( $params{date} ) {
#entries = grep { $_->date eq $params{date} } #entries;
}
if ( $params{title} ) {
#entries = grep { $_->title eq $params{title} } #entries;
}
....
I wanted to also be able to pass in a qr quoted variable to use in the comparison instead but the only way I can think of separating the comparisons is using an if/else block, like so:
if (lc ref($params{whatever}) eq 'regexp') {
#use =~
} else {
#use eq
}
Is there a shorter way of doing it? Because of reasons beyond my control I'm using Perl 5.8.8 so I can't use the smart match operator.
TIA

This is Perl, so of course there's a CPAN module: Match::Smart. It works very similarly to Perl 5.10's smart match operator, only you type smart_match($a, $b) rather than $a ~~ $b.
You may wish to compare with the perlsyn documentation for 5.10 smartmatching as Match::Smart handles quite a few more situations.
Otherwise, I don't see anything wrong with:
sub smart_match {
my ($target, $param) = #_;
if (ref $param eq 'Regexp') {
return ($target =~ qr/$param/);
}
else {
return ($target eq $param);
}
}
#entries = grep { smart_match($_->date, $params{date}) } #entries;

I don't know exactly what you want your end result to be, but you can do:
for my $field (qw(filename date title)) {
my $p = $param($field};
#entries = (ref($p) eq 'regexp')
? grep { $_->$field =~ /$p/ } #entries
: grep { $_->$field eq $p } #entries;
}
Alternatively, you can turn even your 'eq' comparisions into regular expressions, e.g.:
my $entry = "string to be equal to";
my $re = qr/^\Q$entry\E/;
and that simplifies the logic in the for loop.

Related

Optimize perl hash mess

I have Perl code, which looks messy:
my $x = $h->[1];
foreach my $y (keys %$x) {
my $ax = $x->{$y};
foreach my $ay (keys %$ax) {
if (ref($ax->{$ay}) eq 'JE::Object::Proxy') {
my $bx = $ax->{$ay};
if ($$bx->{class_info}->{name} eq 'HTMLImageElement') {
print $$bx->{value}->{src}, "\n";
}
}
}
}
Is it possible to optimize the code above to not use any variables, just $h, as that one is an input?
Here's my crack at it:
print $$_->{value}{src}, "\n" for grep {
ref $_ eq 'JE::Object::Proxy' &&
$$_->{class_info}{name} eq 'HTMLImageElement'
} map {
values %$_
} values %{ $h->[1] };
You're using keys, when you really just want values.
foreach my $h ( grep { ref() eq 'HASH' } values %$x ) {
foreach my $obj (
grep { ref() eq 'JE::Object::Proxy'
and $_->{class_info}{name} eq 'HTMLImageElement'
} values %$h
) {
say $obj->{value}{src};
}
}
A lot of the "messiness" can be cleaned up by reducing your line count and minimizing how much nested code you have. Use the each command to get the next key and its associated value from the hash in one line. [EDIT: as Axeman pointed out, you really only need the values, so I'm replacing my use of each with values]. Also, use a pair of next statement to skip the print statement.
for my $ax (values %{$h->[1]} ) {
for my $bx (values %$ax ) {
next unless ref($bx) eq 'JE::Object::Proxy';
next unless $$bx->{class_info}->{name} eq 'HTMLImageElement';
print "$$bx->{value}->{src}\n";
}
}
Just removing the helper variables is easy, something like this should do it:
foreach my $y (keys %{$h->[1]}) {
foreach my $ax (%{$h->[1]->{$y}) {
foreach my $ay (keys %$ax) {
if(ref($h->[1]->{$y}->{$ay}) eq 'JE::Object::Proxy') {
if($h->[1]->{$y}->{$ay}->{class_info}->{name} eq 'HTMLImageElement') {
print $h->[1]->{$y}->{$ay}->{value}->{src}, "\n";
}
}
}
}
}
You could also remove the duplicated if:
foreach my $y (keys %{$h->[1]}) {
foreach my $ax (%{$h->[1]->{$y}) {
foreach my $ay (keys %$ax) {
if(ref($h->[1]->{$y}->{$ay}) eq 'JE::Object::Proxy' && $h->[1]->{$y}->{$ay}->{class_info}->{name} eq 'HTMLImageElement') {
print $h->[1]->{$y}->{$ay}->{value}->{src}, "\n";
}
}
}
}
But I don't really see how to make it more readable: it is a iteration over a three dimensional structure.

Perl Hashref Substitutions

I'm using DBI to connect to Sybase to grab records in a hash_ref element. The DBI::Sybase driver has a nasty habit of returning records with trailing characters, specifically \x00 in my case. I'm trying to write a function to clean this up for all elements in the hashref, the code I have below does the trick, but I can't find a way to make it leaner, and I know there is away to do this better:
#!/usr/bin/perl
my $dbh = DBI->connect('dbi:Sybase:...');
my $sql = qq {SELECT * FROM table WHERE age > 18;};
my $qry = $dbh->selectall_hashref($sql, 'Name');
foreach my $val(values %$qry) {
$qry->{$val} =~ s/\x00//g;
}
foreach my $key(keys %$qry) {
$qry->{$key} =~ s/\x00//g;
foreach my $val1(keys %{$qry->{$key}}) {
$qry->{$key}->{$val1} =~ s/\x00//g;
}
foreach my $key1(keys %{$qry->{$key}}) {
$qry->{$key}->{$key1} =~ s/\x00//g;
}
While I think that a regex substitution is not exactly an ideal solution (seems like it should be fixed properly instead), here's a handy way to solve it with chomp.
use Data::Dumper;
my %a = (
foo => {
a => "foo\x00",
b => "foo\x00"
},
bar => {
c => "foo\x00",
d => "foo\x00"
},
baz => {
a => "foo\x00",
a => "foo\x00"
}
);
$Data::Dumper::Useqq=1;
print Dumper \%a;
{
local $/ = "\x00";
chomp %$_ for values %a;
}
print Dumper \%a;
chomp will remove a single trailing value equal to whatever the input record separator $/ is set to. When used on a hash, it will chomp the values.
As you will note, we do not need to use the values directly, as they are aliased. Note also the use of a block around the local $/ statement to restrict its scope.
For a more manageable solution, it's probably best to make a subroutine, called recursively. I used chomp again here, but you can just as easily skip that and use s/\x00//g. Or tr/\x00//d, which basically does the same thing. chomp is only safer in that it only removes characters from the end of the string, like s/\x00$// would.
strip_null(\%a);
print Dumper \%a;
sub strip_null {
local $/ = "\x00";
my $ref = shift;
for (values %$ref) {
if (ref eq 'HASH') {
strip_null($_); # recursive strip
} else {
chomp;
}
}
}
First your code:
foreach my $val(values %$qry) {
$qry->{$val} =~ s/\x00//g;
# here you are using a value as if it was a key
}
foreach my $key(keys %$qry) {
$qry->{$key} =~ s/\x00//g;
foreach my $val1(keys %{$qry->{$key}}) {
$qry->{$key}->{$val1} =~ s/\x00//g;
}
foreach my $key1(keys %{$qry->{$key}}) {
$qry->{$key}->{$key1} =~ s/\x00//g;
}
# and this does the same thing twice...
what you should do is:
foreach my $x (values %$qry) {
foreach my $y (ref $x eq 'HASH' ? values %$x : $x) {
$y =~ s/(?:\x00)+$//
}
}
which will clean up only ending nulls in the values of two levels of the hash.
the body of the loop could also be written as:
if (ref $x eq 'HASH') {
foreach my $y (values %$x) {
$y =~ s/(?:\x00)+$//
}
}
else {
$x =~ s/(?:\x00)+$//
}
But that forces you to write the substitution twice, and you shouldn't repeat yourself.
Or if you really want to reduce the code, using the implicit $_ variable works well:
for (values %$qry) {
s/(?:\x00)+$// for ref eq 'HASH' ? values %$_ : $_
}

Perl: Matching hash keys to a regular expression

I'm wondering if Perl has a built-in way to check for the existence of a hash element with a key matching a particular regex. For example:
my %h = ( 'twelve' => 12, 'thirteen' => 13, 'fourteen' => 14 );
I'm wondering if there is any way to do this:
print "We have 12\n" if exists $h{twelve};
print "We have some teens\n" if exists $h{/.*teen$/};
The smart match operator does this (available since Perl v5.10).
$a $b Type of Match Implied Matching Code
====== ===== ===================== =============
...
Regex Hash hash key grep grep /$a/, keys %$b
...
Sample usage:
# print if any key in %h ends in "teen"
print "We have some teens\n" if /.*teen$/ ~~ %h;
In addition to the other answers here you can also do this with perl's grep:
print "We have some teens\n" if grep {/.*teen/} keys %h;
Yeah, it's called:
use List::Util qw<first>;
# Your regex does not compile perhaps you mean /teen$/
my $value = $hash{ ( first { m/teen/ } keys %hash ) || '' };
(Before smart match, that is. See mob's answer for smart match.)
You could also sort the keys:
my $value = $hash{ ( first { m/teen/ } sort keys %hash ) || '' };
I would freeze this into an "operation":
use Scalar::Util qw<reftype>;
sub values_for_keys_like (\[%$]$) {
my $ref = reftype( $_[0] ) eq 'HASH' ? $_[0] : $$_[0];
return unless my #keys = keys %$ref;
my $regex = shift;
# allow strings
$regex = qr/$regex/ unless my $typ = ref( $regex );
# allow regex or just plain ol' filter functions.
my $test = $typ eq 'CODE' ? $regex : sub { return unless m/$regex/; 1 };
if ( wantarray ) {
return unless my #k = grep { defined $test->( $_ ) } #keys;
return #$ref{ #k };
}
else {
return unless my $key = first { defined $test->( $_ ) } #keys;
return $ref->{ $key };
}
}
And you could use it like so:
my $key = values_for_keys_like( %hash => qr/teen/ );
Or
my $key = values_for_keys_like( $base->{level_two}{level_three} => qr/teen/ );
There's no built-in way, but there's Tie::Hash::Regex on CPAN.

Where can I find an array of the (un)assigned Unicode code points for a particular block?

At the moment, I'm writing these arrays by hand.
For example, the Miscellaneous Mathematical Symbols-A block has an entry in hash like this:
my %symbols = (
...
miscellaneous_mathematical_symbols_a => [(0x27C0..0x27CA), 0x27CC,
(0x27D0..0x27EF)],
...
)
The simpler, 'continuous' array
miscellaneous_mathematical_symbols_a => [0x27C0..0x27EF]
doesn't work because Unicode blocks have holes in them. For example, there's nothing at 0x27CB. Take a look at the code chart [PDF].
Writing these arrays by hand is tedious, error-prone and a bit fun. And I get the feeling that someone has already tackled this in Perl!
Perhaps you want Unicode::UCD? Use its charblock routine to get the range of any named block. If you want to get those names, you can use charblocks.
This module is really just an interface to the Unicode databases that come with Perl already, so if you have to do something fancier, you can look at the lib/5.x.y/unicore/UnicodeData.txt or the various other files in that same directory to get what you need.
Here's what I came up with to create your %symbols. I go through all the blocks (although in this sample I skip that ones without "Math" in their name. I get the starting and ending code points and check which ones are assigned. From that, I create a custom property that I can use to check if a character is in the range and assigned.
use strict;
use warnings;
digest_blocks();
my $property = 'My::InMiscellaneousMathematicalSymbolsA';
foreach ( 0x27BA..0x27F3 )
{
my $in = chr =~ m/\p{$property}/;
printf "%X is %sin $property\n",
$_, $in ? '' : ' not ';
}
sub digest_blocks {
use Unicode::UCD qw(charblocks);
my $blocks = charblocks();
foreach my $block ( keys %$blocks )
{
next unless $block =~ /Math/; # just to make the output small
my( $start, $stop ) = #{ $blocks->{$block}[0] };
$blocks->{$block} = {
assigned => [ grep { chr =~ /\A\p{Assigned}\z/ } $start .. $stop ],
unassigned => [ grep { chr !~ /\A\p{Assigned}\z/ } $start .. $stop ],
start => $start,
stop => $stop,
name => $block,
};
define_my_property( $blocks->{$block} );
}
}
sub define_my_property {
my $block = shift;
(my $subname = $block->{name}) =~ s/\W//g;
$block->{my_property} = "My::In$subname"; # needs In or Is
no strict 'refs';
my $string = join "\n", # can do ranges here too
map { sprintf "%X", $_ }
#{ $block->{assigned} };
*{"My::In$subname"} = sub { $string };
}
If I were going to do this a lot, I'd use the same thing to create a Perl source file that has the custom properties already defined so I can just use them right away in any of my work. None of the data should change until you update your Unicode data.
sub define_my_property {
my $block = shift;
(my $subname = $block->{name}) =~ s/\W//g;
$block->{my_property} = "My::In$subname"; # needs In or Is
no strict 'refs';
my $string = num2range( #{ $block->{assigned} } );
print <<"HERE";
sub My::In$subname {
return <<'CODEPOINTS';
$string
CODEPOINTS
}
HERE
}
# http://www.perlmonks.org/?node_id=87538
sub num2range {
local $_ = join ',' => sort { $a <=> $b } #_;
s/(?<!\d)(\d+)(?:,((??{$++1})))+(?!\d)/$1\t$+/g;
s/(\d+)/ sprintf "%X", $1/eg;
s/,/\n/g;
return $_;
}
That gives me output suitable for a Perl library:
sub My::InMiscellaneousMathematicalSymbolsA {
return <<'CODEPOINTS';
27C0 27CA
27CC
27D0 27EF
CODEPOINTS
}
sub My::InSupplementalMathematicalOperators {
return <<'CODEPOINTS';
2A00 2AFF
CODEPOINTS
}
sub My::InMathematicalAlphanumericSymbols {
return <<'CODEPOINTS';
1D400 1D454
1D456 1D49C
1D49E 1D49F
1D4A2
1D4A5 1D4A6
1D4A9 1D4AC
1D4AE 1D4B9
1D4BB
1D4BD 1D4C3
1D4C5 1D505
1D507 1D50A
1D50D 1D514
1D516 1D51C
1D51E 1D539
1D53B 1D53E
1D540 1D544
1D546
1D54A 1D550
1D552 1D6A5
1D6A8 1D7CB
1D7CE 1D7FF
CODEPOINTS
}
sub My::InMiscellaneousMathematicalSymbolsB {
return <<'CODEPOINTS';
2980 29FF
CODEPOINTS
}
sub My::InMathematicalOperators {
return <<'CODEPOINTS';
2200 22FF
CODEPOINTS
}
Maybe this?
my #list =
grep {chr ($_) =~ /^\p{Assigned}$/}
0x27C0..0x27EF;
#list = map { $_ = sprintf ("%X", $_ )} #list;
print "#list\n";
Gives me
27C0 27C1 27C2 27C3 27C4 27C5 27C6 27C7 27C8 27C9 27CA 27D0 27D1 27D2 27D3
27D4 27D5 27D6 27D7 27D8 27D9 27DA 27DB 27DC 27DD 27DE 27DF 27E0 27E1 27E2
27E3 27E4 27E5 27E6 27E7 27E8 27E9 27EA 27EB
I don't know why you wouldn't say miscellaneous_mathematical_symbols_a => [0x27C0..0x27EF], because that's how the Unicode standard is defined according to the PDF.
What do you mean when you say it doesn't "work"? If it's giving you some sort of error when you check the existence of the character in the block, then why not just weed them out of the block when your checker comes across an error?

Perl: if ( element in list )

I'm looking for presence of an element in a list.
In Python there is an in keyword and I would do something like:
if element in list:
doTask
Is there something equivalent in Perl without having to manually iterate through the entire list?
UPDATE:
The smartmatch family of features are now experimental
Smart match, added in v5.10.0 and significantly revised in v5.10.1, has been a regular point of complaint. Although there are a number of ways in which it is useful, it has also proven problematic and confusing for both users and implementors of Perl. There have been a number of proposals on how to best address the problem. It is clear that smartmatch is almost certainly either going to change or go away in the future. Relying on its current behavior is not recommended.
Warnings will now be issued when the parser sees ~~, given, or when.
If you can get away with requiring Perl v5.10, then you can use any of the following examples.
The smart match ~~ operator.
if( $element ~~ #list ){ ... }
if( $element ~~ [ 1, 2, 3 ] ){ ... }
You could also use the given/when construct. Which uses the smart match functionality internally.
given( $element ){
when( #list ){ ... }
}
You can also use a for loop as a "topicalizer" ( meaning it sets $_ ).
for( #elements ){
when( #list ){ ... }
}
One thing that will come out in Perl 5.12 is the ability to use the post-fix version of when. Which makes it even more like if and unless.
given( $element ){
... when #list;
}
If you have to be able to run on older versions of Perl, there still are several options.
You might think you can get away with using List::Util::first, but there are some edge conditions that make it problematic.
In this example it is fairly obvious that we want to successfully match against 0. Unfortunately this code will print failure every time.
use List::Util qw'first';
my $element = 0;
if( first { $element eq $_ } 0..9 ){
print "success\n";
} else {
print "failure\n";
}
You could check the return value of first for defined-ness, but that will fail if we actually want a match against undef to succeed.
You can safely use grep however.
if( grep { $element eq $_ } 0..9 ){ ... }
This is safe because grep gets called in a scalar context. Arrays return the number of elements when called in scalar context. So this will continue to work even if we try to match against undef.
You could use an enclosing for loop. Just make sure you call last, to exit out of the loop on a successful match. Otherwise you might end up running your code more than once.
for( #array ){
if( $element eq $_ ){
...
last;
}
}
You could put the for loop inside the condition of the if statement ...
if(
do{
my $match = 0;
for( #list ){
if( $element eq $_ ){
$match = 1;
last;
}
}
$match; # the return value of the do block
}
){
...
}
... but it might be more clear to put the for loop before the if statement.
my $match = 0;
for( #list ){
if( $_ eq $element ){
$match = 1;
last;
}
}
if( $match ){ ... }
If you're only matching against strings, you could also use a hash. This can speed up your program if #list is large and, you are going to match against %hash several times. Especially if #array doesn't change, because then you only have to load up %hash once.
my %hash = map { $_, 1 } #array;
if( $hash{ $element } ){ ... }
You could also make your own subroutine. This is one of the cases where it is useful to use prototypes.
sub in(&#){
local $_;
my $code = shift;
for( #_ ){ # sets $_
if( $code->() ){
return 1;
}
}
return 0;
}
if( in { $element eq $_ } #list ){ ... }
if( $element ~~ #list ){
do_task
}
~~ is the "smart match operator", and does more than just list membership detection.
grep is helpful here
if (grep { $_ eq $element } #list) {
....
}
If you plan to do this many times, you can trade-off space for lookup time:
#!/usr/bin/perl
use strict; use warnings;
my #array = qw( one ten twenty one );
my %lookup = map { $_ => undef } #array;
for my $element ( qw( one two three ) ) {
if ( exists $lookup{ $element }) {
print "$element\n";
}
}
assuming that the number of times the element appears in #array is not important and the contents of #array are simple scalars.
List::Util::first
$foo = first { ($_ && $_ eq "value" } #list; # first defined value in #list
Or for hand-rolling types:
my $is_in_list = 0;
foreach my $elem (#list) {
if ($elem && $elem eq $value_to_find) {
$is_in_list = 1;
last;
}
}
if ($is_in_list) {
...
A slightly different version MIGHT be somewhat faster on very long lists:
my $is_in_list = 0;
for (my $i = 0; i < scalar(#list); ++$i) {
if ($list[i] && $list[i] eq $value_to_find) {
$is_in_list = 1;
last;
}
}
if ($is_in_list) {
...
TIMTOWTDI
sub is (&#) {
my $test = shift;
$test->() and return 1 for #_;
0
}
sub in (#) {#_}
if( is {$_ eq "a"} in qw(d c b a) ) {
print "Welcome in perl!\n";
}
List::MoreUtils
On perl >= 5.10 the smart match operator is surely the easiest way, as many others have already said.
On older versions of perl, I would instead suggest List::MoreUtils::any.
List::MoreUtils is not a core module (some say it should be) but it's very popular and it's included in major perl distributions.
It has the following advantages:
it returns true/false (as Python's in does) and not the value of the element, as List::Util::first does (which makes it hard to test, as noted above);
unlike grep, it stops at the first element which passes the test (perl's smart match operator short circuits as well);
it works with any perl version (well, >= 5.00503 at least).
Here is an example which works with any searched (scalar) value, including undef:
use List::MoreUtils qw(any);
my $value = 'test'; # or any other scalar
my #array = (1, 2, undef, 'test', 5, 6);
no warnings 'uninitialized';
if ( any { $_ eq $value } #array ) {
print "$value present\n"
}
P.S.
(In production code it's better to narrow the scope of no warnings 'uninitialized').
Probably Perl6::Junction is the clearest way to do. No XS dependencies, no mess and no new perl version required.
use Perl6::Junction qw/ any /;
if (any(#grant) eq 'su') {
...
}
This blog post discusses the best answers to this question.
As a short summary, if you can install CPAN modules then the best solutions are:
if any(#ingredients) eq 'flour';
or
if #ingredients->contains('flour');
However, a more usual idiom is:
if #any { $_ eq 'flour' } #ingredients
which i find less clear.
But please don't use the first() function! It doesn't express the intent of your code at all. Don't use the "Smart match" operator: it is broken. And don't use grep() nor the solution with a hash: they iterate through the whole list. While any() will stop as soon as it finds your value.
Check out the blog post for more details.
PS: i'm answering for people who will have the same question in the future.
You can accomplish a similar enough syntax in Perl if you do some Autoload hacking.
Create a small package to handle the autoload:
package Autoloader;
use strict;
use warnings;
our $AUTOLOAD;
sub AUTOLOAD {
my $self = shift;
my ($method) = (split(/::/, $AUTOLOAD))[-1];
die "Object does not contain method '$method'" if not ref $self->{$method} eq 'CODE';
goto &{$self->{$method}};
}
1;
Then your other package or main script will contain a subroutine that returns the blessed object which gets handled by Autoload when its method attempts to be called.
sub element {
my $elem = shift;
my $sub = {
in => sub {
return if not $_[0];
# you could also implement this as any of the other suggested grep/first/any solutions already posted.
my %hash; #hash{#_} = ();
return (exists $hash{$elem}) ? 1 : ();
}
};
bless($sub, 'Autoloader');
}
This leaves you with usage looking like:
doTask if element('something')->in(#array);
If you reorganize the closure and its arguments, you can switch the syntax around the other way to make it look like this, which is a bit closer to the autobox style:
doTask if search(#array)->contains('something');
function to do that:
sub search {
my #arr = #_;
my $sub = {
contains => sub {
my $elem = shift or return;
my %hash; #hash{#arr} = ();
return (exists $hash{$elem}) ? 1 : ();
}
};
bless($sub, 'Autoloader');
}