With perl, I want to find a substring in a string. The result is a True/False and then I will decide what to do. There is good post here, but the usage is vague for me.
my $big_string = "Hello Good World";
my $pat = "world";
While the capital/non-capital letters are not important, I want to get True.
use List::Util 'any';
my $match_found = any { /$pat/ } #big_string;
if (match_found)
print "yes\n";
else
print "no\n";
Is that correct? Is there any better API for this purpose?
It's not correct because it doesn't even compile: if / else are always followed by a { } block in Perl. Also, if (match_found) is missing the $ sigil on the variable.
If you really want case insensitive matching (i.e. to ignore the differences between uppercase and lowercase letters), you need to add the i flag to the regex.
Finally, your code doesn't define a #big_string array, only a $big_string scalar.
So:
use strict;
use warnings;
use List::Util 'any';
my $big_string = "Hello Good World";
my $pat = "world";
my $match_found = any { /$pat/i } $big_string;
if ($match_found) {
print "yes\n";
} else {
print "no\n";
}
This code would work, but it can be improved.
First off, why use any at all? We don't have a list of multiple strings to check:
my $match_found = $big_string =~ /$pat/i;
Second, $pat doesn't look like it's supposed to be a regex. It's a plain string. This doesn't make any difference for alphanumeric characters (such as world), but in general we should escape all regex metacharacters in strings that aren't intended to be interpreted as regexes. This can be done with the quotemeta function (or its \Q \E short form):
my $match_found = $big_string =~ /\Q$pat\E/i;
This is our improved version:
use strict;
use warnings;
my $big_string = "Hello Good World";
my $pat = "world";
my $match_found = $big_string =~ /\Q$pat\E/i;
if ($match_found) {
print "yes\n";
} else {
print "no\n";
}
Finally, we don't even need a regex for a simple substring search. Instead we can do this:
use feature 'fc';
my $match_found = index(fc($big_string), fc($pat)) >= 0;
fc implements full Unicode case folding.
Related
I would like to show in result three words "hidden". I don't know how to do this. Right now, when I start the program, I don't have effect. I see only my text. How to repair it ? I want to create simple program. I have my string and I want to show words which I want to find in my result.
use strict;
my $text='Try to find the hidden string here! Hidden or hidden';
my $a = m/\bhidden\b/;
if ($text=~ $a) {
print "I found word: $a";
}
else {
print "No match, sorry.\n";
}
Let's see how we could debug this problem.
Your first step should be to add use warnings to your code. You should always include both use strict and use warnings. They are like Perl's safety nets and only a foolhardy programmer codes without them.
#!/usr/bin/perl
use strict;
use warnings; # ADDED THIS
my $text='Try to find the hidden string here! Hidden or hidden';
my $a = m/\bhidden\b/;
if ($text=~ $a) {
print "I found word: $a";
}
else {
print "No match, sorry.\n";
}
Now we get a warning:
Use of uninitialized value $_ in pattern match (m//) at test_re line 8.
Line 8 is this:
if ($text=~ $a) {
So let's see what is in $a (as an aside - please don't use $a as a variable name. Firstly, it's a terrible, generic name; and, secondly, Perl uses it as a special variable in the sort function):
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say'; # ADDED THIS
my $text='Try to find the hidden string here! Hidden or hidden';
my $a = m/\bhidden\b/;
say "\$a is [$a]"; # ADDED THIS
if ($text=~ $a) {
print "I found word: $a";
}
else {
print "No match, sorry.\n";
}
Now, as well as our warning, we get this output:
$a is []
I don't think you're expecting $a to be empty at that point, are you? We can go further and determine whether it's an empty string or an undefined value.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $text='Try to find the hidden string here! Hidden or hidden';
my $a = m/\bhidden\b/;
say "\$a is [$a]";
say "\$a is ", (defined $a ? 'defined' : 'undefined'); # ADDED THIS
if ($text=~ $a) {
print "I found word: $a";
}
else {
print "No match, sorry.\n";
}
And we see:
$a is defined
So it's an empty string. What do you expect it to be?
Here's where is it set?
my $a = m/\bhidden\b/;
I'm not sure what you expected this to do, but it actually means "match \bhidden\b and assign the result of the match to $a". And what does it match against that regex? Well, m/.../ matches the contents of$` by default. (That, incidentally, explains our original "uninitialized value $" warning - Perl just got the line number wrong by one.)
So what actually happens on that line is that Perl tries to match \bhidden\b against $_. This fails (as $_ is undefined) and the match returns a false value (the empty string) that gets assigned to $a.
Then your code goes on to run this:
if ($text=~ $a) {
And, as $a is the empty string, it matches (any string will match against an empty string and Perl tells you it has matched $a (which still contains the empty string).
I'm not sure what you were trying to do with this line:
my $a = m/\bhidden\b/;
Perhaps you were trying "pre-compile" the regex in some way. In which case, you were looking for qr/.../, not m/.../.
So maybe you wanted this:
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $text = 'Try to find the hidden string here! Hidden or hidden';
my $a = qr/\bhidden\b/; # CHANGED THIS
say "\$a is [$a]";
say "\$a is ", (defined $a ? 'defined' : 'undefined');
if ($text =~ $a) {
print "I found word: $a";
}
else {
print "No match, sorry.\n";
}
Which gives this output:
$a is [(?^:\bhidden\b)]
$a is defined
I found word: (?^:\bhidden\b)
Or, if you want to show what has been matched, rather than the regex that you are matching, then you need to "capture" that text:
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $text = 'Try to find the hidden string here! Hidden or hidden';
my $a = qr/\b(hidden)\b/; # CHANGED THIS
say "\$a is [$a]";
say "\$a is ", (defined $a ? 'defined' : 'undefined');
if (my ($matched) = $text =~ $a) { # CHANGED THIS
print "I found word: $matched"; # CHANGED THIS
}
else {
print "No match, sorry.\n";
}
Which gives us:
$a is [(?^:\b(hidden)\b)]
$a is defined
I found word: hidden
Ah, but you want to see all of the words. So capture the matches in an array and use the /g option to match multiple times:
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $text = 'Try to find the hidden string here! Hidden or hidden';
my $a = qr/\b(hidden)\b/;
say "\$a is [$a]";
say "\$a is ", (defined $a ? 'defined' : 'undefined');
if (my (#matched) = $text =~ /$a/g) { # CHANGED THIS
print "I found word: #matched"; # CHANGED THIS
}
else {
print "No match, sorry.\n";
}
We now get this:
$a is [(?^:\b(hidden)\b)]
$a is defined
I found word: hidden hidden
That only matches two instances of "hidden" as the match is case-sensitive. So turn off the case sensitivity with /i.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $text = 'Try to find the hidden string here! Hidden or hidden';
my $a = qr/\b(hidden)\b/i; # CHANGED THIS
say "\$a is [$a]";
say "\$a is ", (defined $a ? 'defined' : 'undefined');
if (my (#matched) = $text =~ /$a/g) {
print "I found word: #matched";
}
else {
print "No match, sorry.\n";
}
Which gives us:
$a is [(?^i:\b(hidden)\b)]
$a is defined
I found word: hidden Hidden hidden
But, honestly, I think pre-compiling the regex here is just over-complicating things. I'd write this code like this (removing the $a variable completely):
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $text = 'Try to find the hidden string here! Hidden or hidden';
if (my (#matched) = $text =~ /\b(hidden)\b/gi) {
print "I found word: #matched";
}
else {
print "No match, sorry.\n";
}
Perhaps you are looking for this piece of code
use strict;
use warnings;
use feature 'say';
my $text='Try to find the hidden string here! Hidden or hidden';
my $re = qr/(hidden)/i; # () capture found ignoring case
my #match = $text =~ /$re/g; # g specifies to look for all accurances
say for #match; # output each element of #match
Output
hidden
Hidden
hidden
I am new in Perl and trying to catch hold of the scripting language where I come across regular expression to validate a email address. I am sharing the perl script. I am not sure where I am making mistake in it. The \# part is omitted always following which the correct email id is also showing as invalid.
Here is the code :
#!/usrs/bin/perl/
$string = "XYZ#yahoo.com";
if ( $string =~ /([a-zA-Z]+)\#([a-zA-Z]+)\.(com|net|org)/)
{
print "TRUE";
print $string;
}
else
{
print "FALSE";
print $string;
}
Thanks for your help.
The regex for validating an email address is included in the source code for Email::Valid. I've copied it below:
$RFC822PAT = <<'EOF';
[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\
xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xf
f\n\015()]*)*\)[\040\t]*)*(?:(?:[^(\040)<>#,;:".\\\[\]\000-\037\x80-\x
ff]+(?![^(\040)<>#,;:".\\\[\]\000-\037\x80-\xff])|"[^\\\x80-\xff\n\015
"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[\040\t]*(?:\([^\\\x80-\
xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80
-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*
)*(?:\.[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\
\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\
x80-\xff\n\015()]*)*\)[\040\t]*)*(?:[^(\040)<>#,;:".\\\[\]\000-\037\x8
0-\xff]+(?![^(\040)<>#,;:".\\\[\]\000-\037\x80-\xff])|"[^\\\x80-\xff\n
\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[\040\t]*(?:\([^\\\x
80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^
\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040
\t]*)*)*#[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([
^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\
\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:[^(\040)<>#,;:".\\\[\]\000-\037\
x80-\xff]+(?![^(\040)<>#,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-
\xff\n\015\[\]]|\\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()
]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\
x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\04
0\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\
n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\
015()]*)*\)[\040\t]*)*(?:[^(\040)<>#,;:".\\\[\]\000-\037\x80-\xff]+(?!
[^(\040)<>#,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\
]]|\\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\
x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\01
5()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)*|(?:[^(\040)<>#,;:".
\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>#,;:".\\\[\]\000-\037\x80-\xff]
)|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[^
()<>#,;:".\\\[\]\x80-\xff\000-\010\012-\037]*(?:(?:\([^\\\x80-\xff\n\0
15()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][
^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)|"[^\\\x80-\xff\
n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[^()<>#,;:".\\\[\]\
x80-\xff\000-\010\012-\037]*)*<[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?
:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-
\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:#[\040\t]*
(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015
()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()
]*)*\)[\040\t]*)*(?:[^(\040)<>#,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\0
40)<>#,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\
[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\
xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*
)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x80
-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x
80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t
]*)*(?:[^(\040)<>#,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>#,;:".\\
\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])
*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x
80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80
-\xff\n\015()]*)*\)[\040\t]*)*)*(?:,[\040\t]*(?:\([^\\\x80-\xff\n\015(
)]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\
\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*#[\040\t
]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\0
15()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015
()]*)*\)[\040\t]*)*(?:[^(\040)<>#,;:".\\\[\]\000-\037\x80-\xff]+(?![^(
\040)<>#,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|
\\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80
-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()
]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x
80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^
\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040
\t]*)*(?:[^(\040)<>#,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>#,;:".
\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff
])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\
\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x
80-\xff\n\015()]*)*\)[\040\t]*)*)*)*:[\040\t]*(?:\([^\\\x80-\xff\n\015
()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\
\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)?(?:[^
(\040)<>#,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>#,;:".\\\[\]\000-
\037\x80-\xff])|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\
n\015"]*)*")[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|
\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))
[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x80-\xff
\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\x
ff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(
?:[^(\040)<>#,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>#,;:".\\\[\]\
000-\037\x80-\xff])|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\
xff\n\015"]*)*")[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\x
ff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)
*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)*#[\040\t]*(?:\([^\\\x80-\x
ff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-
\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)
*(?:[^(\040)<>#,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>#,;:".\\\[\
]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\]
)[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-
\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\x
ff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(
?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80
-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:[^(\040)<
>#,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>#,;:".\\\[\]\000-\037\x8
0-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\])[\040\t]*(?:
\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]
*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)
*\)[\040\t]*)*)*>)
EOF
But your actual problem is this line:
$string = "XYZ#yahoo.com";
The #yahoo looks like an array variable to Perl, and and you don't have an array called #yahoo it gets replaced with an empty string. Try printing the value of $string to see what you get.
The solution is either to use single quotes (so the array variable isn't expanded):
$string = 'XYZ#yahoo.com';
Or to escape the # with a \:
$string = "XYZ\#yahoo.com";
Alway add use strict and use warnings to your Perl programs. They would have told you what the problem is.
#!/usr/bin/perl
use strict;
use warnings;
use Email::Valid;
my $email_address = 'a.n#example.com';
unless( Email::Valid->address($email_address) ) {
print "Sorry, that email address is not valid!";
}
Reference Site: http://learn.perl.org/examples/email_valid.html
For Regex Pattern try this:
my $pattern= '^([a-zA-Z][\w\_\.]{6,15})\#([a-zA-Z0-9.-]+)\.([a-zA-Z]{2,4})$';
Reference Site: https://www.experts-exchange.com/articles/8652/Email-validation-using-Regular-Expression-in-Perl.html
Change your if condition to
if ($string =~ /^[a-z0-9A-Z][A-Za-z0-9.]+[A-Za-z0-9]\#[A-Za-z0-9.-]+$/)
and change
$string = "XYZ#yahoo.com"; to
$string = 'XYZ#yahoo.com';
Refference : http://perlmaven.com/email-validation-using-regular-expression-in-perl
for details.
Try
if ($email =~ /^[a-z0-9]([a-z0-9.]+[a-z0-9])?\#[a-z0-9.-]+$/)
And some test code:
#!/usr/bin/perl
my $email = "john.doe\#acme.org";
if ($email =~ /^[a-z0-9]([a-z0-9.]+[a-z0-9])?\#[a-z0-9.-]+$/) {
print "Valid email\n";
} else {
print "Not valid email\n";
}
exit;
Output:
Valid email
I have the following code in perl:
package SignatureScheme;
use strict;
use warnings;
use constant {
SHA256_RSA_V1 => 'SHA256-RSA-V1',
SHA256_HMAC_V1 => 'SHA256-HMAC-V1',
};
How can I loop through the constants listed above and compare them to a string?
For one-time comparison:
my $string = "xyz";
my $found = grep ($_ eq $string) (SHA256_RSA_V1, SHA256_HMAC_V1);
Constants are merely subroutines returning your strings, therefore you can use them alomst anywhere where you would have used the strings themselves.
For repeated comparisons, to improve performance, use hash lookups.
my %lookup_hash = map {($_=>1)} (SHA256_RSA_V1, SHA256_HMAC_V1);
foreach my $lookup_string (#lookup_strings) {
if ($lookup_hash{$lookup_string}) { #do your thing }
}
my #schemes = (SHA256_RSA_V1, SHA256_HMAC_V1);
foreach my $scheme (#schemes) {
if ($scheme eq $string) {
# do something
}
}
My perl script needs to detect the extension of an existing file and print out the filename. The input that specifies the filename with a vague extension would be in this format:
foo.(txt|abc)
and the script would print "foo.txt" if it exists. If foo.txt does not exist and foo.abc exists, then it would print "foo.abc."
How can I do this detection and printing of the correct existing file in a neat and clean way?
Thanks!
Actually, you've almost got the regular expression right there: the only thing you need to do is escape the . with a backslash (since . means "any character except the newline character" in regular expressions), and it would also help to put a ?: inside of the parentheses (since you don't need to capture the file extension). Also, ^ and $ denote markers for the beginning and the end of the string (so we're matching the entire string, not just part of a string...that way we don't get a match for the file name "thisisnotfoo.txt")
Something like this should work:
use strict;
use warnings;
my $file1="foo.txt";
my $file2="foo.abc";
my $file3="some_other_file";
foreach ($file1,$file2,$file3)
{
if(/^foo\.(?:txt|abc)$/)
{
print "$_\n";
}
}
When the above code is run, the output is:
foo.txt
foo.abc
Take a look at perldoc perlretut for more stuff about regular expressions.
You may want to look at glob, but you'd have to use a different syntax. The equivalent would be:
foo.{txt,abc}
See File::Glob for more information. Also note that this will return a list of all of the matches, so you'll have to do your own rules if it should prefer one when multiple exist.
sub text_to_glob {
my ($s) = #_;
$s =~ s/([\\\[\]{}*?~\s])/\\$1/g;
return $s;
}
my $pat = 'foo.(txt|abc)';
my #possibilities;
if (my ($base, $alt) = $pat =~ /^(.*\.)\(([^()]*)\)\z/s) {
#possibilities = glob(
text_to_glob($base) .
'{' . join(',', split(/\|/, $alt)) . '}'
);
} else {
#possibilities = $pat;
}
for my $possibility (#possibilities) {
say "$possibility: ", -e $possibility ? "exists" : "doesn't exist";
}
glob, but also see File::Glob
-e
use strict;
use warnings;
FILE:
for (glob "file.{txt,abc}") {
if (-f $_) {
print $_, "\n";
last FILE;
}
}
Part of the specification says "Some names are special, e.g. Hughie, Dewey, Louis, and Donald. Other names may be added over the lifetime of the project at arbitrary times. Whenever you input one of those names, play quack.wav."
I could write ...
while (<>) {
if ($_ =~ /Hughie|Dewey|Louis/) {
quack() ;
}
elsif ($_ =~ /Donald/ {
quack() ;
you_re_fired_apprentice() ; # Easter egg don't tell QA
}
}
... but though untaxing to implement, it looks WTF-y: Where's the binary search? What if there were a sudden stupendous increase in the number of duck names? It would not scale at all!
I could create empty files using those names in a temporary directory, and then use the "file exists" API, but that seems roundabout, and I would have to be sure they were deleted at the end.
Surely there is a better way?
You could write that, but you should write this:
my %ducks = map {$_ => 1} qw(Hughie Dewey Louis);
while (<>) {
if ($ducks{$_}) {
quack() ;
}
elsif ($_ eq 'Donald') {
quack() ;
you_re_fired_apprentice() ; # Easter egg don't tell QA
}
}
Creating the hash takes a little bit of time, but not more than O(n). Lookup with a hash is O(1) though, so it is much more efficient than sequential search (via grep or a regex with alternation) assuming you will be checking for more than one or two items.
By the way, the regex that you have will match the words anywhere in the search string. You need to add start and end anchors if you want an exact match.
Alternatively, you could use smart matching
my #ducks = qw(Hughie Dewey Louis);
my $name = 'Dewey';
say 'smart match' if $name ~~ #ducks;
This is what is used by switch statements, so you could write
given ($name) {
when (#ducks) {
quack();
}
when ('Donald') {
quack();
you_re_fired_apprentice(); # Easter egg don't tell QA
}
}
As mentioned, hashes are the way to go for this. This is sort of what OOP looked like before OOP.
use strict;
use warnings;
my %duck_action = (
Donald => sub {quack(); you_re_fired_apprentice()},
Hughie => sub {quack()},
Dewie => sub {quack()},
Louis => sub {quack()},
);
for my $duck (qw( Hughie Dewie Donald Louis Porkie )) {
print "$duck: ";
my $action = $duck_action{$duck} || &no_such_duck;
$action->();
}
sub quack {
print "Quack!\n";
}
sub you_re_fired_apprentice {
print "You're fired!\n";
}
sub no_such_duck {
print "No such duck!\n";
}
You can use a Perl Hash. See also How can I represent sets in Perl? and Representing Sets in Perl.
Using hashes to implement a set is not exactly pretty, but it should be fast.
To find a string in a list, you could also use any in List::MoreUtils
use List::MoreUtils qw(any);
my #ducks = qw(Hughie Dewey Louis);
my $name = 'Dewey';
say 'any' if any {$name eq $_} #ducks;
If you're tied to using an array rather than a hash, you can use perl's grep function to search the array for a string.
#specialnames = qw(Hughie Dewey Louis);
while (my $value = <>) {
if (grep {$value eq $_}, #specialnames) {
quack() ;
}
elsif ($_ =~ /Donald/ {
quack() ;
you_re_fired_apprentice() ; # Easter egg don't tell QA
}
}
This does scale a lot worse than a hash, and might even scale worse than copying the array into a hash and then doing hash lookups.