What is the smartest way of searching through an array of strings for a matching string in Perl?
One caveat, I would like the search to be case-insensitive
so "aAa" would be in ("aaa","bbb")
It depends on what you want the search to do:
if you want to find all matches, use the built-in grep:
my #matches = grep { /pattern/ } #list_of_strings;
if you want to find the first match, use first in List::Util:
use List::Util 'first';
my $match = first { /pattern/ } #list_of_strings;
if you want to find the count of all matches, use true in List::MoreUtils:
use List::MoreUtils 'true';
my $count = true { /pattern/ } #list_of_strings;
if you want to know the index of the first match, use first_index in List::MoreUtils:
use List::MoreUtils 'first_index';
my $index = first_index { /pattern/ } #list_of_strings;
if you want to simply know if there was a match, but you don't care which element it was or its value, use any in List::Util:
use List::Util 1.33 'any';
my $match_found = any { /pattern/ } #list_of_strings;
All these examples do similar things at their core, but their implementations have been heavily optimized to be fast, and will be faster than any pure-perl implementation that you might write yourself with grep, map or a for loop.
Note that the algorithm for doing the looping is a separate issue than performing the individual matches. To match a string case-insensitively, you can simply use the i flag in the pattern: /pattern/i. You should definitely read through perldoc perlre if you have not previously done so.
I guess
#foo = ("aAa", "bbb");
#bar = grep(/^aaa/i, #foo);
print join ",",#bar;
would do the trick.
Perl 5.10+ contains the 'smart-match' operator ~~, which returns true if a certain element is contained in an array or hash, and false if it doesn't (see perlfaq4):
The nice thing is that it also supports regexes, meaning that your case-insensitive requirement can easily be taken care of:
use strict;
use warnings;
use 5.010;
my #array = qw/aaa bbb/;
my $wanted = 'aAa';
say "'$wanted' matches!" if /$wanted/i ~~ #array; # Prints "'aAa' matches!"
If you will be doing many searches of the array, AND matching always is defined as string equivalence, then you can normalize your data and use a hash.
my #strings = qw( aAa Bbb cCC DDD eee );
my %string_lut;
# Init via slice:
#string_lut{ map uc, #strings } = ();
# or use a for loop:
# for my $string ( #strings ) {
# $string_lut{ uc($string) } = undef;
# }
#Look for a string:
my $search = 'AAa';
print "'$string' ",
( exists $string_lut{ uc $string ? "IS" : "is NOT" ),
" in the array\n";
Let me emphasize that doing a hash lookup is good if you are planning on doing many lookups on the array. Also, it will only work if matching means that $foo eq $bar, or other requirements that can be met through normalization (like case insensitivity).
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my #bar = qw(aaa bbb);
my #foo = grep {/aAa/i} #bar;
print Dumper \#foo;
Perl string match can also be used for a simple yes/no.
my #foo=("hello", "world", "foo", "bar");
if ("#foo" =~ /\bhello\b/){
print "found";
}
else{
print "not found";
}
For just a boolean match result or for a count of occurrences, you could use:
use 5.014; use strict; use warnings;
my #foo=('hello', 'world', 'foo', 'bar', 'hello world', 'HeLlo');
my $patterns=join(',',#foo);
for my $str (qw(quux world hello hEllO)) {
my $count=map {m/^$str$/i} #foo;
if ($count) {
print "I found '$str' $count time(s) in '$patterns'\n";
} else {
print "I could not find '$str' in the pattern list\n"
};
}
Output:
I could not find 'quux' in the pattern list
I found 'world' 1 time(s) in 'hello,world,foo,bar,hello world,HeLlo'
I found 'hello' 2 time(s) in 'hello,world,foo,bar,hello world,HeLlo'
I found 'hEllO' 2 time(s) in 'hello,world,foo,bar,hello world,HeLlo'
Does not require to use a module.
Of course it's less "expandable" and versatile as some code above.
I use this for interactive user answers to match against a predefined set of case unsensitive answers.
Related
I'm trying to write a short script in Perl to go through a an array of strings provided by the user, check in a hash table to see if there are vowels in the strings, then return the strings minus the vowels. I know this would be easier to accomplish using regex, but the parameters for the problem state that a hash table, exists(), and split() must be used. This is the script I have so far:
my #vowels = qw(a e i o u A E I O U);
my %vowel;
foreach $v (#vowels) {
$vowel{$v} = undef;
}
foreach $word (#ARGV) {
my #letter_array = split(undef,$word);
}
foreach $letter (#letter_array) {
print($letter) if !exists($vowel{$letter})
}
print "\n"
Input: hello
Expected output: hll
Actual output: nothing
There are no error messages, so I know it's not a syntax error.
Any ideas what I'm messing up? I'm much more comfortable with Python and this is one of my first attempts at Perl.
An alternative and more compact method of achieving the same thing is to use the substitute operator, "s" with a regular expression that matches the vowels.
Here is an example
use strict;
use warnings;
for my $word (#ARGV)
{
print $word =~ s/[aeiou]//gri;
}
or more succinctly like this
use strict;
use warnings;
for (#ARGV)
{
print s/[aeiou]//gri;
}
Key points to note
the regular expression uses the Character Class [aeiou] to match a single lower-case vowel.
the substitute operator has been given three options
the i option to force a case insensitive match. This means the Character Class [aeiou] will match both uppercase and lower-case vowels.
the g option to make the substitute match all instances of the regular expression -- in this instance it will match against all the vowels in the string.
the r option (which is a newish addition to Perl) to get the substitute operator to return the substituted string.
running that gives this
$ perl try.pl hello world
hllwrld
You should use strict not to mess visibility of your variables.
If you require perl version 5.12 or higher it would be used automatically.
So your list #letter_array exists only in foreach my $word (#ARGV) loop. That's why it's empty in the end.
If you want to fix that you'll get the following code:
#!/usr/bin/env perl
use strict;
use warnings;
my #vowels = qw( a e i o u y A E I O U Y );
my %vowel;
foreach my $v (#vowels) {
$vowel{$v} = undef;
}
my #letter_array;
foreach my $word (#ARGV) {
#letter_array = split //, $word;
}
foreach my $letter (#letter_array) {
print($letter) if !exists($vowel{$letter})
}
print "\n"
But this code is still not practical.
If you would get more that 1 word in the input, you'll show only the last one, because the # letter_array overwrites each time.
You can use map to get the hash of vowels much easier without using extra variables.
You can use less loops if you would handle each word right after reading it.
You can also use unless if you want to check if not to make it prettier and more perl-style.
Don't use split on undef. Better use split //, $word
You can use for instead of foreach because it's the same but shorter :)
So you can get an optimised solution.
#!/usr/bin/env perl
use 5.012;
use warnings;
my %vowels = map { $_ => undef } qw( a e i o u y A E I O U Y );
for my $word (#ARGV) {
my #letters = split //, $word;
for my $letter (#letters) {
print $letter unless exists $vowels{$letter};
}
print ' ';
}
print "\n"
Result:
$ perl delete_vowels.pl hello world
hll wrld
I'm new to perl and I tried to replace my foreach-statement (version 1):
use warnings;
use strict;
$cmd_list = "abc network xyz";
foreach my $item (split(" ", $cmd_list)) {
if( $item eq "network") {
$PRINT_IP = 1;
}
}
with a grep (version 2, from some example in the internet) which should give me the count (because of scalar context) of the value "network" in a string array:
$PRINT_IP = grep(/^$network$/, split(" ", $cmd_list));
for version 1 the if statement works as supposed, but for version 2 it always evaluates to false:
if($PRINT_IP) {
...
}
Where is my fault?
There seems to be a typo, as $network is a variable; you may mean /^network$/.
Having use strict; in your program would have alerted you to an untended (so undeclared) variable. Having use warnings; would have alerted you to the use of an uninitialized variable in regex compilation.
In the loop you only set the variable $PRINT_TP (to 1) if there are any elements that match. Then List::Util has a function just for that
my $PRINT_IP = any { $_ eq 'network' } split ' ', $cmd_list;
or
my $PRINT_IP = any { /^network\z/ } split ' ', $cmd_list;
if you need regex for more complex conditions.
This returns 1 on the first match, the result that your for loop produces. If you actually need a count then indeed use grep. When there's no match $PRINT_IP is set to '', an empty string.
The library is more efficient, firstly since it stops processing once a match happens. You can also do that by adding last in your if condition but List::Util routines are generally more efficient.
More importantly: please always have use warnings; and use strict; at the beginning.
Is there an opposite for the operator ~~ in Perl? I used it to match an element in an array like this:
my #arr = qw /hello ma duhs udsyyd hjgdsh/;
print "is in\n" if ('duhs' ~~ #arr);
This prints is in. It is pretty cool, because I don't have to iterate all the array and compare each record. My problem is that I want to do something if I don't have a match. I could go on the else side but I rather find a opposite for `~~'
You can also use List::Util (newer versions of this module only) or List::MoreUtils with any, none and friends.
use List::Util qw(any none);
my #arr = qw /hello ma duhs udsyyd hjgdsh/;
say "oh hi" if any { $_ eq 'hello' } #arr;
say "no goodbyes?" if none { $_ eq 'goodbye' } #arr;
While not perl-native, it doesn't need the experimental smartmatching.
unless == if not
print "is not in\n" unless ('duhs' ~~ #arr);
Note: Smart matching is experimental in perl 5.18+. See Smart matching is experimental/depreciated in 5.18 - recommendations? So use the following instead:
print "is not in\n" unless grep { $_ eq 'duhs' } #arr;
Need help figuring out working perl code to put in place of "any of the elements in #array"
%hash = (key1 => 'value1',key2 => 'value2',key3 => 'value3',);
#array= ('value3','value4','value6');
if ($hash{ 'key1' } ne <<any of the elements in #array>>) {print "YAY!";}
CPAN solution: use List::MoreUtils
use List::MoreUtils qw{any};
print "YAY!" if any { $hash{'key1'} eq $_ } #array;
Why use this solution over the alternatives?
Can't use smart match in Perl before 5.10
grep solution loops through the entire list even if the first element of 1,000,000 long list matches. any will short-circuit and quit the moment the first match is found, thus it is more efficient.
A 5.10+ solution: Use a smart-match!
say 'Modern Yay!' unless $hash{$key} ~~ #array;
You could use the grep function. Here's a basic example:
print "YAY!" if grep { $hash{'key1'} eq $_ } #array;
In a scalar context like this grep will give you the number of matching entries in #array. If that's non-zero, you have a match.
You could also use a hash:
#hash{"value3","value4","value6"}=undef;
print "YAY" if exists $hash{key1};
I have a list of possible values:
#a = qw(foo bar baz);
How do I check in a concise way that a value $val is present or absent in #a?
An obvious implementation is to loop over the list, but I am sure TMTOWTDI.
Thanks to all who answered! The three answers I would like to highlight are:
The accepted answer - the most "built-in" and backward-compatible way.
RET's answer is the cleanest, but only good for Perl 5.10 and later.
draegtun's answer is (possibly) a bit faster, but requires using an additional module. I do not like adding dependencies if I can avoid them, and in this case do not need the performance difference, but if you have a 1,000,000-element list you might want to give this answer a try.
If you have perl 5.10, use the smart-match operator ~~
print "Exist\n" if $var ~~ #array;
It's almost magic.
Perl's bulit in grep() function is designed to do this.
#matches = grep( /^MyItem$/, #someArray );
or you can insert any expression into the matcher
#matches = grep( $_ == $val, #a );
This is answered in perlfaq4's answer to "How can I tell whether a certain element is contained in a list or array?".
To search the perlfaq, you could search through the list of all questions in perlfaq using your favorite browser.
From the command line, you can use the -q switch to perldoc to search for keywords. You would have found your answer by searching for "list":
perldoc -q list
(portions of this answer contributed by Anno Siegel and brian d foy)
Hearing the word "in" is an indication that you probably should have used a hash, not a list or array, to store your data. Hashes are designed to answer this question quickly and efficiently. Arrays aren't.
That being said, there are several ways to approach this. In Perl 5.10 and later, you can use the smart match operator to check that an item is contained in an array or a hash:
use 5.010;
if( $item ~~ #array )
{
say "The array contains $item"
}
if( $item ~~ %hash )
{
say "The hash contains $item"
}
With earlier versions of Perl, you have to do a bit more work. If you are going to make this query many times over arbitrary string values, the fastest way is probably to invert the original array and maintain a hash whose keys are the first array's values:
#blues = qw/azure cerulean teal turquoise lapis-lazuli/;
%is_blue = ();
for (#blues) { $is_blue{$_} = 1 }
Now you can check whether $is_blue{$some_color}. It might have been a good idea to keep the blues all in a hash in the first place.
If the values are all small integers, you could use a simple indexed array. This kind of an array will take up less space:
#primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
#is_tiny_prime = ();
for (#primes) { $is_tiny_prime[$_] = 1 }
# or simply #istiny_prime[#primes] = (1) x #primes;
Now you check whether $is_tiny_prime[$some_number].
If the values in question are integers instead of strings, you can save quite a lot of space by using bit strings instead:
#articles = ( 1..10, 150..2000, 2017 );
undef $read;
for (#articles) { vec($read,$_,1) = 1 }
Now check whether vec($read,$n,1) is true for some $n.
These methods guarantee fast individual tests but require a re-organization of the original list or array. They only pay off if you have to test multiple values against the same array.
If you are testing only once, the standard module List::Util exports the function first for this purpose. It works by stopping once it finds the element. It's written in C for speed, and its Perl equivalent looks like this subroutine:
sub first (&#) {
my $code = shift;
foreach (#_) {
return $_ if &{$code}();
}
undef;
}
If speed is of little concern, the common idiom uses grep in scalar context (which returns the number of items that passed its condition) to traverse the entire list. This does have the benefit of telling you how many matches it found, though.
my $is_there = grep $_ eq $whatever, #array;
If you want to actually extract the matching elements, simply use grep in list context.
my #matches = grep $_ eq $whatever, #array;
Use the first function from List::Util which comes as standard with Perl....
use List::Util qw/first/;
my #a = qw(foo bar baz);
if ( first { $_ eq 'bar' } #a ) { say "Found bar!" }
NB. first returns the first element it finds and so doesn't have to iterate through the complete list (which is what grep will do).
One possible approach is to use List::MoreUtils 'any' function.
use List::MoreUtils qw/any/;
my #array = qw(foo bar baz);
print "Exist\n" if any {($_ eq "foo")} #array;
Update: corrected based on zoul's comment.
Interesting solution, especially for repeated searching:
my %hash;
map { $hash{$_}++ } #a;
print $hash{$val};
$ perl -e '#a = qw(foo bar baz);$val="bar";
if (grep{$_ eq $val} #a) {
print "found"
} else {
print "not found"
}'
found
$val='baq';
not found
If you don't like unnecessary dependency, implement any or first yourself
sub first (&#) {
my $code = shift;
$code->() and return $_ foreach #_;
undef
}
sub any (&#) {
my $code = shift;
$code->() and return 1 foreach #_;
undef
}