Does Perl's smartmatch work against arrays (or arrayrefs) of mixed strings and compiled regexes? - perl

I'm hoping to use Perl's smart-matching to do look-ups against an array that contains both strings and compiled regexes:
do_something($file) unless ($file ~~ [ #global_excludes, $local_excludes ]);
(Both the #global_excludes array and the $local_excludes array reference can contain a mixture of strings or compiled regexes.)
Is smart-matching in Perl that smart? Currently, when I run the above with v5.10.1 I get:
Argument "script.sh" isn't numeric in smart match at test.pl line 422.
Argument "Debug.log" isn't numeric in smart match at test.pl line 422.
Argument "lib.pm" isn't numeric in smart match at test.pl line 422.
...
Why does smartmatch think that $file is a number?
For now, I'm just doing it manually:
do_something($file) unless exclude ($file, [ #global_excludes, $local_excludes ]);
where exclude looks like this:
sub exclude
{
my ($file, $list) = #_;
foreach my $lookup (#$list)
{
if (is_regexp($lookup))
{
return 1 if $file =~ $lookup;
}
else
{
return 1 if $file eq $lookup;
}
}
return 0;
}
Basically, I'm looking to make the solution more Perly.

Yes, this does work. The problem is that one of your excludes is a number, not a string. When the right-hand side of a smartmatch is a number, Perl does an == numeric comparison.
my $s = 'foo';
$s ~~ 2; # means $s == 2, warns "$s isn't numeric"
$s ~~ '2'; # means $s eq '2', no warning
If you intend to do a string comparison, make sure your excludes are strings. If necessary, stringify them first (e.g. #array = map { ref($_) ? $_ : "$_" } #array).

Bug found! Was a simple empty string in one of the elements of
[ #global_excludes, $local_excludes ]
I guess in such case perl 5.10.1 figures an empty string for a number

Related

I need to search for a value in a perl array and if I find a match execute some code

This is sort of what I am wanting to do. At present mm returns nothing, while searchname returns the expected value.
This is a perl script embedded in a web page.
I have tried numerous approaches to this code but nothing seems to provide the results I desire. I think it is just a case of syntax.
# search for an item
if ($modtype eq "search") {
$searchname=$modname;
print "Value of searchname $searchname\n";
my #mm = grep{$searchname} #names;
print "Value of mm #mm\n";
if ($mm eq $searchname) {
print "$searchname found!\n";
}
else {
print "$searchname not Found\n";
}
}
my #mm = grep { $_ eq $searchname } #names;
if (#mm) {
print "found\n";
}
grep takes a boolean expression, not just a variable. In that expression, $_ refers to the current list element. By using an equality comparison we get (in #mm) all elements of #names that are equal to $searchname, if any.
To check whether an array is empty, you can simply use it in boolean context, as in if (#mm).
If you don't care about the found elements themselves, just whether there are any, you can use grep in scalar context:
my $count = grep { $_ eq $searchname } #names;
if ($count > 0) {
print "found $count results\n";
}
This will give you the number of matching elements.
If you don't need to know that number, just whether there was any result at all, you can use any from List::Util:
use List::Util qw(any);
if (any { $_ eq $searchname } #names) {
...
}
If #names is big, this is potentially more efficient because it can stop after the first match is found.
I'm not sure what $mm refers to in your code. Did you start your code with use strict; use warnings;? If not, you should.
Looks like you misunderstand a couple of things.
my #mm = grep{$searchname} #names;
The grep() function takes two arguments. A block of code ({ $searchname }) and a list of values (#names). For each value in the list, it puts the value into $_ and executes the code block. If the code block returns a true value then the contents of $_ is added to the output list.
Your block of code ignores $_ and just checks for the value of $searchname. That is very likely to always be true, so all of the values from #names get copied into #mm.
I think it's more likely that you want:
my #mm = grep{ $_ eq $searchname } #names;
Secondly, you suddenly start using a new variable called $mm. I suspect you're getting confused between #mm and $mm which are completely different variables with no connection with each other.
I think what you're actually trying to do is to look at the first element of #mm so you want:
if ($mm[0] eq $searchname)
But, given that values only end up in #mm if they are equal to $searchname (because that's what your grep() does), I think you really just want to check whether or not anything ended up in #mm. So you should use:
if (#mm)
Which is, in my opinion, easier to understand.

Perl module / subroutine to remove shared substring of strings

Given a list/array of strings (in particular, UNIX paths), remove the shared part, eg:
./dir/fileA_header.txt
./dir/fileA_footer.txt
I probably will strip the directory before using the function, but strictly speacking this won't change much.
I'd like to know a method to either remove the shared parts (./dir/fileA_) or remove the not-shared part.
Thank you for your help!
This is a bit of a hack, but if you don't need to support Unicode strings (that is, if all characters have a value below 256), you can use xor to get the length of the longest common prefix of two strings:
my $n = do {
($str1 ^ $str2) =~ /^\0*/;
$+[0]
};
You can apply this operation in a loop to get the common prefix of a list of strings:
use v5.12.0;
use warnings;
sub common_prefix {
my $prefix = shift;
for my $str (#_) {
($prefix ^ $str) =~ /^\0*/;
substr($prefix, $+[0]) = '';
}
return $prefix;
}
my #paths = qw(
./dir/fileA_header.txt
./dir/fileA_footer.txt
);
say common_prefix(#paths);
Output: ./dir/fileA_

Where is the error "Use of uninitialized value in string ne" coming from?

Where is the uninitialised value in the below code?
#!/usr/bin/perl
use warnings;
my #sites = (undef, "a", "b");
my $sitecount = 1;
my $url;
while (($url = $sites[$sitecount]) ne undef) {
$sitecount++;
}
Output:
Use of uninitialized value in string ne at t.pl line 6.
Use of uninitialized value in string ne at t.pl line 6.
Use of uninitialized value in string ne at t.pl line 6.
Use of uninitialized value in string ne at t.pl line 6.
You can't use undef in a string comparison without a warning.
if ("a" ne undef) { ... }
will raise a warning. If you want to test if a variable is defined or not, use:
if (defined $var) { ... }
Comments about the original question:
That's a strange way to iterate over an array. The more usual way of doing this would be:
foreach my $url (#sites) { ... }
and drop the $sitecount variable completely, and don't overwrite $url in the loop body. Also drop the undef value in that array. If you don't want to remove that undef for some reason (or expect undefined values to be inserted in there), you could do:
foreach my $url (#sites) {
next unless defined $url;
...
}
If you do want to test for undefined with your form of loop construct, you'd need:
while (defined $sites[$sitecount]) {
my $url = $sites[$sitecount];
...
$sitecount++;
}
to avoid the warnings, but beware of autovivification, and that loop would stop short if you have undefs mixed in between other live values.
The correct answers have already been given (defined is how you check a value for definedness), but I wanted to add something.
In perlop you will read this description of ne:
Binary "ne" returns true if the left argument is stringwise not equal
to the right argument.
Note the use of "stringwise". It basically means that just like with other operators, such as ==, where the argument type is pre-defined, any arguments to ne will effectively be converted to strings before the operation is performed. This is to accommodate operations such as:
if ($foo == "1002") # string "1002" is converted to a number
if ($foo eq 1002) # number 1002 is converted to a string
Perl has no fixed data types, and relies on conversion of data. In this case, undef (which coincidentally is not a value, it is a function: undef(), which returns the undefined value), is converted to a string. This conversion will cause false positives, that may be hard to detect if warnings is not in effect.
Consider:
perl -e 'print "" eq undef() ? "yes" : "no"'
This will print "yes", even though clearly the empty string "" is not equal to not defined. By using warnings, we can catch this error.
What you want is probably something like:
for my $url (#sites) {
last unless defined $url;
...
}
Or, if you want to skip to a certain array element:
my $start = 1;
for my $index ($start .. $#sites) {
last unless defined $sites[$index];
...
}
Same basic principle, but using an array slice, and avoiding indexes:
my $start = 1;
for my $url (#sites[$start .. $#sites]) {
last unless defined $url;
...
}
Note that the use of last instead of next is the logical equivalent of your while loop condition: When an undefined value is encountered, the loop is exited.
More debugging: http://codepad.org/Nb5IwX0Q
If you, like in this paste above, print out the iteration counter and the value, you will quite clearly see when the different warnings appear. You get one warning for the first comparison "a" ne undef, one for the second, and two for the last. The last warnings come when $sitecount exceeds the max index of #sites, and you are comparing two undefined values with ne.
Perhaps the message would be better to understand if it was:
You are trying to compare an uninitialized value with a string.
The uninitialized value is, of course, undef.
To explicitely check if $something is defined, you need to write
defined $something
ne is for string comparison, and undef is not a string:
#!/usr/bin/perl
use warnings;
('l' ne undef) ? 0 : 0;
Use of uninitialized value in string ne at t.pl line 3.
It does work, but you get a [slightly confusing] warning (at least with use warnings) because undef is not an "initialized value" for ne to use.
Instead, use the operator defined to find whether a value is defined:
#!/usr/bin/perl
use warnings;
my #sites = (undef, "a", "b");
my $sitecount = 1;
my $url;
while (defined $sites[$sitecount]) { # <----------
$url = $sites[$sitecount];
# ...
$sitecount++;
}
... or loop over the #sites array more conventionally, as Mat explores in his answer.

Grep to find item in Perl array

Every time I input something the code always tells me that it exists. But I know some of the inputs do not exist. What is wrong?
#!/usr/bin/perl
#array = <>;
print "Enter the word you what to match\n";
chomp($match = <STDIN>);
if (grep($match, #array)) {
print "found it\n";
}
The first arg that you give to grep needs to evaluate as true or false to indicate whether there was a match. So it should be:
# note that grep returns a list, so $matched needs to be in brackets to get the
# actual value, otherwise $matched will just contain the number of matches
if (my ($matched) = grep $_ eq $match, #array) {
print "found it: $matched\n";
}
If you need to match on a lot of different values, it might also be worth for you to consider putting the array data into a hash, since hashes allow you to do this efficiently without having to iterate through the list.
# convert array to a hash with the array elements as the hash keys and the values are simply 1
my %hash = map {$_ => 1} #array;
# check if the hash contains $match
if (defined $hash{$match}) {
print "found it\n";
}
You seem to be using grep() like the Unix grep utility, which is wrong.
Perl's grep() in scalar context evaluates the expression for each element of a list and returns the number of times the expression was true.
So when $match contains any "true" value, grep($match, #array) in scalar context will always return the number of elements in #array.
Instead, try using the pattern matching operator:
if (grep /$match/, #array) {
print "found it\n";
}
This could be done using List::Util's first function:
use List::Util qw/first/;
my #array = qw/foo bar baz/;
print first { $_ eq 'bar' } #array;
Other functions from List::Util like max, min, sum also may be useful for you
In addition to what eugene and stevenl posted, you might encounter problems with using both <> and <STDIN> in one script: <> iterates through (=concatenating) all files given as command line arguments.
However, should a user ever forget to specify a file on the command line, it will read from STDIN, and your code will wait forever on input
I could happen that if your array contains the string "hello", and if you are searching for "he", grep returns true, although, "he" may not be an array element.
Perhaps,
if (grep(/^$match$/, #array)) more apt.
You can also check single value in multiple arrays like,
if (grep /$match/, #array, #array_one, #array_two, #array_Three)
{
print "found it\n";
}

When does the difference between a string and a number matter in Perl 5?

If a string in Perl 5 passes looks_like_number, it might as well be a number. For instance,
my $s = "10" + 5;
results in $s being assigned 15.
Are there any cases where a string does not behave like its numeric equivalent would?
When dealing with bitwise operators. 123 ^ 456 is 435, but "123" ^ "456" is "\x05\x07\x05". The bitwise operators work numerically if either operand is a number.
I can only think of one: when checking for truth. Strings that are equivalent to 0, but that are not "0", such as "0.0", "0 but true", "0e0", etc. all pass looks_like_number and evaluate to 0 in numeric context, but are still considered true values.
An equivalent number and string behave differently in hash keys -- or, more generally, any time we stringify a largish number:
my (%g, %h);
$g{ 1234000000000000 } = undef; # '1.234e+015' => undef
$h{'1234000000000000'} = undef; # '1234000000000000' => undef
Note that we are still within the range where Perl can store the number precisely:
> perl -e 'printf qq{%.f\n}, 1234000000000000 + $_ for +1, 0, -1'
1234000000000001
1234000000000000
1233999999999999
DB<1> sub is_num { my $x = shift; "$x " ~~ $x }
DB<2> print is_num(123)
1
DB<3> print is_num('123')
DB<4>