perl regular expressions with date - perl

Trying to validate the date of the format (YYYY_MM_DD). With the test variable set as 2012_4_123 it's printing "valid format" after script is run. It should give an "invalid error" message because in the regular expression the day part is checked to be atleast 1 digit and not more than 2 digits. Not sure how it's printing "valid format" as the output message.
my $test="2012_4_123";
if ($test !~ m/^(\d{4})_(\d{1,2})_(\d{1,2})/)
{
print "invalid format\n";
}
else
{
print "valid format\n";
}

-if ($test !~ m/^(\d{4})_(\d{1,2})_(\d{1,2})/)
+if ($test !~ m/^(\d{4})_(\d{1,2})_(\d{1,2})$/)

you're missing a $ at the end. it's matching the string "2012_4_12" because you didn't tell it to match the end of the string too. Your regex should be this.
$test !~ m/^(\d{4})_(\d{1,2})_(\d{1,2})$/

Simply adding $ solves the initial problem of allowing for more than two digits for the day, but introduces a more subtle bug: dates will now validate despite having a newline at the end. This may not matter depending on your application, but it can be avoided by using the regex in the following example:
use strict;
use warnings;
my #tests = (
'2012_4_123',
'2012_11_22',
"2012_11_22\n",
);
use Data::Dumper;
print Dumper \#tests;
foreach my $test (#tests) {
if ( $test !~ m/\A(\d{4})_(\d{1,2})_(\d{1,2})\z/smx )
{
print "invalid format\n";
}
else
{
print "valid format\n";
}
}
Note: /smx is recommended by Perl Best Practices and I write my regexes with it unless there's a specific need not to have it, but it may trip you up if you're not used to it.
/s and /m will allow you to process multiline strings more easily; /s because . will then match newlines and /m to allow you to use ^ and $ to match the start and end of a line respectively, and \A and \z will then match the start and end of the entire string.
/x is simply to allow whitespace and comments within a regex, though you'll need to escape whitespace if you're actually trying to match it.
In this case, it's using \z instead of $ that makes the difference irrespective of the use of /smx.
Also, it mightn't be a bad idea to look at a module to perform date validation rather than just date format validation (again, depending on what you're using this for). See this discussion on perlmonks.

Related

In Perl, can you use a variable for the whole of a match string?

I'm new to Perl, though not to programming, and am working through Learning Perl. The book has exercises to match successive lines of a small text file.
I had the idea of supplying match strings from STDIN, and going through the file for each one:
while(<STDIN>) {
chomp;
$regex = $_;
seek JUNK, 0, 0;
while(<JUNK>) {
chomp();
if(/$regex/) {
say;
}
}
say '';
}
This works fine, but I can't find a way to interpolate an entire match string, e.g.
/fred/i
into the predicate. I tried
if($$matcher) # with $matcher = '/fred/'
but Perl complained.
I imagine this is my ignorance, and should welcome enlightenment.
Statement modifiers, such as /i, are a part of the code telling Perl how to perform the match, not a part of the pattern to be matched. This is why that doesn't work for you.
You have three ways to work around this (well, probably more, since this is Perl we're talking about, but three ways that I can think of straight off):
1) Use extended regex syntax and, when you want a case-insensitive match, enter (?i:fred), as suggested in comments on the question.
2) Use string eval to allow the use of the regular statement modifiers: if (eval "$_ =~ $regex") { say } Note that this method will require you to also type the surrounding slashes. e.g., You'd have to enter /fred/i; just typing in fred would not work. Note also that it's a huge security hole to do this without validating your input first, since the user's entered text is executed as Perl code, just as if it were part of the original program. (Imagine if the user entered //, system("rm -rf /") - it would test against an empty regex, then delete all the files on your computer.) So probably not a recommended approach unless you really know what you're doing and/or you're the only one who will ever run the program.
3) The most complex, but also most correct, solution is to write a parser which inspects the user's entered string to see whether any special flags are present and then responds accordingly. A very simple example which allows the user to append /i for a case-insensitive search:
#!/usr/bin/env perl
use strict;
use warnings;
use 5.010;
while(<STDIN>) {
chomp;
my #parts = split '/', $_;
# If the user input starts with a /, the first part will be empty, so throw
# it away.
shift #parts unless $parts[0];
my $re = shift #parts;
my %flags;
for (#parts) {
for (split '') {
$flags{i} = 1 if $_ eq 'i';
}
}
my $f = join '', keys %flags;
say "Matched" if eval qq('foo' =~ /$re/$f);
}
This also uses string eval, so it is potentially vulnerable to the same kind of security issues as #2, but $re cannot contain any / characters (the split '/' would have ended $re immediately prior to the first /), which prevents code from being inserted there and $f can contain only the letter i (or any other flags you might choose to recognize if you expand on this). So it should be safe. (But, if anyone can demonstrate an exploit I missed, please tell me about it in comments!)
Problem
What you are trying to do can be summarized by:
my $regex = '/fred/i';
my #lines = (
'A line containing some words and Fred said Hello.',
'Another line. Here is a regex embedded in the line: /fred/i',
);
for ( #lines ) {
say if /$regex/;
}
Output:
Another line. Here is a regex embedded in the line: /fred/i
We see that the second line matches $regex, whereas we wanted the first line containing Fred to match the string fred with the (case insensitive) i flag added to the regex. The problem is that the characters / and i in $regex are taken as characters to be matched literally, i.e., they are not interpreted as special characters surrounding a Regex (as part of a Perl expression).
Note:
The character / is special as part of a Perl expression for a regular expression, but it is not special inside the Regex pattern. There are however characters that are special inside the pattern, the so-called meta characters:
\ | ( ) [ { ^ $ * + ? .
see perldoc quotemeta for more information.
A solution using extended patterns
Simply change the first line to:
my $regex = '(?i)fred'; # or alternatively: (?i:fred)
Regex flags can be added to a regex pattern using "Extended patterns" described in the manual perldoc perlre :
Extended Patterns
The syntax for most of these is a pair of parentheses with a question
mark as the first thing within the parentheses. The character after
the question mark indicates the extension.
[...]
(?adlupimnsx-imnsx)
(?^alupimnsx)
One or more embedded pattern-match modifiers, to be turned on (or
turned off if preceded by "-" ) for the remainder of the pattern or
the remainder of the enclosing pattern group (if any). This is
particularly useful for dynamically-generated patterns, such as those
read in from a configuration file, taken from an argument, or
specified in a table somewhere.
[...]
These modifiers are restored at the end of the enclosing group.
Alternatively the non-capturing form can be used:
(?:pattern)
(?adluimnsx-imnsx:pattern)
(?^aluimnsx:pattern)
This is for clustering, not capturing; it groups subexpressions like
"()" , but doesn't make backreferences as "()" does.
The question has been answered in the following comment:
Try (?i:fred), see Extended
patterns in
perldoc perlre for more information
– Håkon Hægland 7 hours ago.

Search special String pattern using Binding operator in Perl

I have one perl script which parse the log and search for perticular pattern like Error, Fatal and decide the Pass and Fail status.
Coming to my question I need to search uniq pattern like "Error (E302/FEP0512SRA)" if such pattern is presnet in any line of Log file it should increase error_cnt by 1.
I tried "\" and Below approch but in both case i got fail to capture above mention patter.
my $str = "Error (E302/FEP0512SRA)";
if($line =~ /$str/) {
$error_cnt++;
}
Please let me know what else i can do so i can catch this string in my log.
Your string contains regex metacharacters (the parentheses). You should either escape them by hand, or use quotemeta:
my $str = quotemeta "Error (E302/FEP0512SRA)";
Regexes contain many operators, so-called metacharacters. Parens () are such metacharacters and have to be escaped. Perl provides the quotemeta function for that. Inside a regex, we can use the equivalent \Q...\E environment, which takes care of that for us. Then:
$error_cnt++ if $line =~ /\Q$str\E/;

Ignoring Hidden Files with File::Find

I am using file::find to walk a directory structure and print it out, but I am having trouble excluding hidden files. Here is what I have so far:
find(\&todo, $start_dir);
sub todo
{
if ($_ =~ /^./)
{
print "hidden file $_\n";
}
else
{
if (-f $_) #check for file
{
file;
}
elsif (-d $_) #check for directory
{
directory($File::Find::dir);
}
else
{
print "ERROR: $_\n";
}
}
}
If I remove the if ($_ =~ /^./) check, the files and directories work fine, but adding this prints everything as a hidden file. As you can see, I only need this to work on unix.
Can anyone point me in the right direction?
EDIT: I forgot a backslash in front of the . - should be if ($_ =~ /^./), but does find() have a default way of ignoring hidden files/directories?
Thanks!
. in a regex matches any character; use \. to match a literal .. And you probably should learn about regexes.
See geekosaur for an explanation of your problem. In a simple case like this, substr might be a better call than a regular expression:
if(substr($_, 0, 1) eq '.') {
Regular expressions are a great tool but they shouldn't be the only thing in your toolbox.
The /^./ is a regular expression. The period means any single character, so what you're saying is match any string that starts with any character, and that pretty much matches all file names.
You need to put a backslash before the period, or use the \Q and '\E'. The \Q disables matching on metacharacters which means it basically removes all magic and makes everything a plain ol' string. In this circumstance, the backslash would be better, but you can imagine trying to match something a bit more complex, and the \Q and \E would work better:
Either of these will work:
if ($_ =~ /^\./)
{
print "hidden file $_\n";
}
if ($_ =~ /^\Q.\E/)
{
print "hidden file $_\n";
}
If you are on Windows the perl module Win32::File will tell you whether a file is hidden or not. Win32::File is installed by default in ActivePerl.
Unfortunately Win32::File doesn't come with any examples, (I wish cpan added a comment feature to every page like the PHP site. Sure there's Annocpan, but that extra click means almost no one ever contributes to it), but this thread will help http://www.perlmonks.org/?node_id=194011
There's also the nicer object oriented Win32::File::Object but it only works with files unless you apply this patch :( https://rt.cpan.org/Public/Bug/Display.html?id=60735
Results seem weird though as lots of files you don't except to be hidden or system turn out to be both. I gave up in the end and just hacked this to ignore certain folders:
if( $_ =~ /(RECYCLER)|(System Volume Information)/ ) {
$File::Find::prune = 1;
}

Need to print the last occurrence of a string in Perl

I have a script in Perl that searches for an error that is in a config file, but it prints out any occurrence of the error. I need to match what is in the config file and print out only the last time the error occurred. Any ideas?
Wow...I was not expecting this much of a response. I should've been more clear in stating this is for log monitoring on a windows box that sends an alert to Nagios. This is actually my first Perl program and all this information has been very helpful. Does anyone know how I can apply this any of the tail answers on a wintel box?
Another way to do it:
perl -n -e '$e = $1 if /(REGEX_HERE)/; END{ print $e }' CONFIG_FILE_HERE
What exactly do you need to print? The line containing the error? More context than that?
File::ReadBackwards can be helpful.
In outline:
my $errinfo;
while (<>)
{
$errinfo = "whatever" if (m/the error pattern/);
}
print "error: $errinfo\n" if ($errinfo);
This catches all errors, but doesn't print until the end, when only the last one survives.
A brute-force approach involves setting up your own pipeline by pointing STDOUT to tail. This allows you to print all errors, and then it's up to tail to worry about only letting the last one out.
You didn't specify, so I assume a legal config line is of the form
Name = some value
Matching that is straightforward:
^ (starting at the beginning of line)
\w+ (one or more “word characters”)
\s+ (followed by mandatory whitespace)
= (followed by an equals sign)
\s+ (more mandatory whitespace)
.+ (some mandatory value)
$ (finishing at the end of the line)
Gluing it together, we get
#! /usr/bin/perl
use warnings;
use strict;
# for demo only
*ARGV = *DATA;
my $pid = open STDOUT, "|-", "tail", "-1" or die "$0: open: $!";
while (<>) {
print unless /^ \w+ \s+ = \s+ .+ $/x;
}
close STDOUT or warn "$0: close: $!";
__DATA__
This = assignment is ok
But := not this
And == definitely not this
Output:
$ ./lasterr
And == definitely not this
With regular expressions, when you want the last occurrence of a pattern, place ^.* at the front of your pattern. For example, to replace the last X in the input with Y, use
$ echo XABCXXXQQQXX | perl -pe 's/^(.*)X/$1Y/'
XABCXXXQQQXY
Note that the ^ is redundant because regular-expression quantifiers are greedy, but I like having it there for emphasis.
Applying this technique to your problem, you can search for the last line in your config file that contains an error as in the following program:
#! /usr/bin/perl
use warnings;
use strict;
local $_ = do { local $/; scalar <DATA> };
if (/\A.* ^(?! \w+ \s+ = \s+ [^\r\n]+ $) (.+?)$/smx) {
print $1, "\n";
}
__DATA__
This = assignment is ok
But := not this
And == definitely not this
The syntax of the regular expression is a bit different because $_ contains multiple lines, but the principle is the same. \A is similar to ^, but it matches only at the beginning of string to be searched. With the /m switch (“multi-line”), ^ matches at logical line boundaries.
Up to this point, we know the pattern
/\A.* ^ .../
matches the last line that looks like something. The negative look-ahead assertion (?!...) looks for a line that is not a legal config line. Ordinarily . matches any character except newline, but the /s switch (“single line”) lifts this restriction. Specifying [^\r\n]+, that is, one or more characters that are neither carriage return nor line feed, does not allow the match to spill into the next line.
Look-around assertions do not capture, so we grab the offending line with (.+?)$. The reason it's safe to use . in this context is because we know the current line is bad and the non-greedy quantifier +? stops matching as soon as it can, which in this case is the end of the current logical line.
All these regular expressions use the /x switch (“extended mode”) to allow extra whitespace: the aim is to improve readability.

Why does this base64 string comparison in Perl fail?

I am trying to compare an encode_base64('test') to the string variable containing the base64 string of 'test'. The problem is it never validates!
use MIMI::Base64 qw(encode_base64);
if (encode_base64("test") eq "dGVzdA==")
{
print "true";
}
Am I forgetting anything?
Here's a link to a Perlmonks page which says "Beware of the newline at the end of the encode_base64() encoded strings".
So the simple 'eq' may fail.
To suppress the newline, say encode_base64("test", "") instead.
When you do a string comparison and it fails unexpectedly, print the strings to see what is actually in them. I put brackets around the value to see any extra whitespace:
use MIME::Base64;
$b64 = encode_base64("test");
print "b64 is [$b64]\n";
if ($b64 eq "dGVzdA==") {
print "true";
}
This is a basic debugging technique using the best debugger ever invented. Get used to using it a lot. :)
Also, sometimes you need to read the documentation for things a couple time to catch the important parts. In this case, MIME::Base64 tells you that encode_base64 takes two arguments. The second argument is the line ending and defaults to a newline. If you don't want a newline on the end of the string you need to give it another line ending, such as the empty string:
encode_base64("test", "")
Here's an interesting tip: use Perl's wonderful and well-loved testing modules for debugging. Not only will that give you a head start on testing, but sometimes they'll make your debugging output a lot faster. For example:
#!/usr/bin/perl
use strict;
use warnings;
use Test::More 0.88;
BEGIN { use_ok 'MIME::Base64' => qw(encode_base64) }
is( encode_base64("test", "dGVzdA==", q{"test" encodes okay} );
done_testing;
Run that script, with perl or with prove, and it won't just tell you that it didn't match, it will say:
# Failed test '"test" encodes okay'
# at testbase64.pl line 6.
# got: 'gGVzdA==
# '
# expected: 'dGVzdA=='
and sharp-eyed readers will notice that the difference between the two is indeed the newline. :)